by Michele Laurelli
Decoding algorithm keeping top B most likely sequences at each step.
Explores multiple hypotheses simultaneously. Beam width B controls quality-speed tradeoff. Produces more coherent but less diverse outputs.
Beam width 5 for translation
Deterministic decoding
Machine translation