Let's recap first:

Code:

0 K/4 K/2 3K/4 K
╪═══════╪═══════╪═══════╪═══════╪
│ B 25% │ C 50% │ D 25% │

To generate pseudorandom numbers with a desired discrete distribution, you generate a pseudorandom integer *x*=[0,*K*) (not including *K*, i.e. *x*=[0,*K*-1]). You have an array filled with the cumulative probability of previous results, scaled to 100%=*K*, starting with a zero for the first option, and adding a 100% at end, to make the search as simple and as fast as possible.

Here, it would be

*probability* = [ 0, *K*/4, 3**K*/4, *K* ]

To find the result our pseudorandom *x* refers to, we find the index *i* in *probability* for which

*probability*[*i*] <= *x* < *probability*[*i*+1]

Here, *i*=0 would be Bob, *i*=1 would be Cindy, and *i*=2 would be David.

(If we had a continuous distribution, we'd have a nondecreasing cumulative probability function 0 <= *P*(*y*) <= 1, which would tell the cumulative probability of all results *x*<=*y*. In this case, we could construct the inverse cumulative probability function *P*^{-1}(*p*)=*x*, generate a pseudorandom number 0<=*p*<=1, and feed it to the inverse cumulative probability function to get the resulting value *x*. This is how you generate values according to a desired Gaussian distribution, for example.)

In the discrete case here, a binary search to find the index *i* in *probability* takes log_{2}*N*, where *N* is the number of possible results. If *N* is ten million, you need 24 iterations of the search loop to find the result. While that looks very acceptable on the face of it, the cache behaviour -- you end up accessing your forty-to-eighty-million-byte array here and there, and it's not going to be all in cache all the time, so the CPU ends up having to read parts of it from RAM -- is likely to make it slow.

Cyberfish is asking for any ideas to make that faster. Three options immediately come to my mind.

If all the probabilities can be approximated to sufficient precision using *p*_{i}/*K*, where *p*_{i} is an integer referring to the probability of result *i*, you could use an array of length *K*, with each entry referring to the result *i*. Generating a pseudorandom integer *x*, 0<=*x*<*K*, you'd get the resulting *i* using simply *result*[*x*]. Obviously, this only works when the number of results *N*<=*K*, and for any kind of precision for the probabilities, you need a high *K*/*N* ratio. If *N* is already ten million, this is unlikely to work for Cyberfish.

When *N* is already large as in this particular case, we only need to get rid of the initial steps of the binary search (because those are the ones that cause the nasty cache behaviour), to get reasonable performance. Fortunately, we can use a very simple indexing approach to limit our binary search to relatively short section of the probability array, depending on the exact values filled in it. (It turns out that if you can, interleaving improbable and probable results would give near minimum-length binary search ranges, and thus near-optimum performance.)

The idea is that you create a much smaller discrete probability array of size 1+*K*/2[sup]*L*[/I] -- using a power of two so that you can get the index to this array by *x*>>*L*. Again, the first entry in the array is zero, and the final (additional) entry is *K*, the index of the last (additional) entry in our *probability* array (referring to 100% probability). You want this array to be small enough to stay reasonably hot in your caches; something like a few thousand of entries, typically. Benchmarking on the target machines is definitely a good idea here.

When you generate the pseudorandom number *x*, 0<=*x*<*K*, the section of the *probability* array you need to search is between *lookup*[*x*>>*L*] and *lookup*[(*x*>>*L*)+1], excluding the upper limit. This is a single (two-entry) probe into the array, and should in most cases reduce the number of iterations needed in the binary search. The number of iterations in the binary search is however less important than the fact that we reduce the memory area involved i the binary search, hopefully getting better performance. Unfortunately, when you generate pseudorandom numbers near the order of *K*, you will have to access the entire *probability* array anyway. Which brings us to the third option:

The third option is surprisingly the most likely to get you best performance, and needs ridiculously little added code:

*Use a cache. Generate **K* values -- not exactly *K*, but of the same order, or at least a few million values, at a time. This way, the *probability* array gets cache-hot, reducing the amortized cost of generating each value. Remember, the "slowness" is caused by cache behaviour; otherwise the algorithm is pretty darn fast already (we're talking about one to three clock cycles per binary search step on current Intel/AMD processors). Pick off the numbers from the cache as is, and only when empty, generate a new batch.