by Michele Laurelli
Text generation sampling from smallest set of tokens whose cumulative probability exceeds threshold P.
Dynamically adjusts vocabulary size based on probability distribution. More flexible than top-k. Common: P=0.9-0.95.
P=0.9 for diverse generation
Dynamic vocabulary selection
Better than top-k