Given a list of pairs (value and probability), what's a fast way to draw many samples from the distribution?
For example, the input could be something like -
<"bob", 0.25>
<"cindy", 0.5>
<"david", 0.25>
meaning for each sample, I want 25% chance of picking Bob, 50% chance of picking Cindy, and 25% chance of picking David.
I can calculate the cumulative probability of each point, and do a binary search for each sample, but that's O(mlog(n)), where m is the number of samples I want, and n is the number of states.
In my case, both m and n are on the order of 10 million, so it would take quite a while.
Is there a faster way?
The probabilities are mostly quite similar.