By looking at the total probability over multiple time steps at a time, you prevent the sampling from accidentally making one bad choice, and being stuck with that bad choice forever.
Prune (diagram) Works well for generating good sequences from your model.