Doubts about detecting the periodicity of psuedo random functions

**tabstop** · 07-15-2009

Originally Posted by Sebastiani

>> The distribution is a theoretical object which represents the actual probabilities of certain observations. But an array which tallies counts is only a measurement of this distribution, not the distribution itself.

I was simply referring to the "frequency" distribution, then, I guess.

Well, okay, as long as you realize that's totally useless then we'll let you do so. A uniform prng means that every number is equally likely to show up on any given pull. Some methods (such as lcg, if I remember my theory correctly) do this by essentially shuffling all the numbers and giving them to you in that order, in which case then yes you will get exactly equal frequencies. Most other methods will not, I expect, but that doesn't mean they aren't uniform prngs. (I mean, head on over to random.org and get some random bits and look at them if nothing else.)

**Sebastiani** · 07-15-2009

>> Well, okay, as long as you realize that's totally useless then we'll let you do so.

How so?

>> A uniform prng means that every number is equally likely to show up on any given pull.

OK, so then you are not simply referring to the actual distribution of values over a period of N samplings, but the relative "uniformity" of each sample within the given range "thus far". Is that correct?

>> Most other methods will not, I expect, but that doesn't mean they aren't uniform prngs.

I don't know, I guess my definition of uniformity is slightly different from the conventional meaning, but at any rate, from my observations most RNG produce a similar quality of uniformity, so to speak (in the context of the type of UD test I posted earlier). The Marsenne twister is interesting because it generally produces higher UD value and yet maintains a wide range, going from one value to another. I actually have a pretty promising candidate of my own that seems to perform comparable to the MT, but I still need to run a lot more tests to verify it's properties.

**tabstop** · 07-15-2009

Originally Posted by Sebastiani

>> Well, okay, as long as you realize that's totally useless then we'll let you do so.

How so?

Because prngs that guarantee that (say) every number is guaranteed to show up once in every n tries will yield uniform frequencies, but don't model a uniform distribution very well at all (a knowledge of the history gives you great information about the future, which it shouldn't).

Originally Posted by Sebastiani

>> A uniform prng means that every number is equally likely to show up on any given pull.

OK, so then you are not simply referring to the actual distribution of values over a period of N samplings, but the relative "uniformity" of each sample within the given range "thus far". Is that correct?

Nope. Distributions have nothing to do with the past and everything to do with the future: the definition of a uniform PRNG is merely that every number in the range is equally likely to be the next number generated. We can test PRNGs by looking at the distributions they actually generate and seeing how likely it is that specific distribution could be generated by a properly-functioning PRNG; i.e., we can get a level of confidence in how well this is working; but since this test is working on the past history it can't actually tell us anything at all about how the PRNG will work in the future.

Note: If it is really a requirement that each number show up exactly the same number of times as every other number, then you're really looking for a shuffle and not RNG.

**Sebastiani** · 07-15-2009

>> the definition of a uniform PRNG is merely that every number in the range is equally likely to be the next number generated. We can test PRNGs by looking at the distributions they actually generate and seeing how likely it is that specific distribution could be generated by a properly-functioning PRNG

Doh! But that's what *I'm* talking about! Look. If I run 65536 tests on a 16-bit RNG and obtain every possible value once then the distribution is perfectly uniform. We haven't considered the order that they were generated or anything else. Do we agree on that?

**tabstop** · 07-15-2009

Originally Posted by Sebastiani

>> the definition of a uniform PRNG is merely that every number in the range is equally likely to be the next number generated. We can test PRNGs by looking at the distributions they actually generate and seeing how likely it is that specific distribution could be generated by a properly-functioning PRNG

Doh! But that's what *I'm* talking about! Look. If I run 65536 tests on a 16-bit RNG and obtain every possible value once then the distribution is perfectly uniform. We haven't considered the order that they were generated or anything else. Do we agree on that?

No. I agree that the *observed frequency* is perfectly uniform. That adds credence to the claim that the original PRNG is uniform.

I guess what I'm doing is making clear the distinction between
"If the observed distribution is exactly equal, then the PRNG is very likely uniform"
and
"If the PRNG is uniform, then the observed distributions will be equal."
The first is true, the second is oh so very false, and (at least it seemed to me as I was reading) that you were starting to head down that second path.

If we had a "real" rng (ignoring what's practically possible with a computer -- something using white noise etc) that generated 16 bit random numbers uniformly, and we did your 65536 tests, the probability of getting each number exactly once is 4.1e-495 (yes, a decimal point, 494 zeros and then 41). And in fact, if we had such a "real" rng, then every possible distribution would be equally likely -- generating 65536 0's has the same probability. That is, after all, what uniform means -- each possibility is equally likely. The theoretical expected value for each frequency is one, to be sure, but with a standard deviation of 0.99999237, so there's some room for error here. What something like chi-squared is doing is measuring how far our observed distributions are from the theoretical expectations, and giving us a probability for being that far (or farther!) from the "true mean", assuming the distribution is correct.

**Sebastiani** · 07-15-2009

>> No. I agree that the *observed frequency* is perfectly uniform. That adds credence to the claim that the original PRNG is uniform. I guess what I'm doing is making clear the distinction between "If the observed distribution is exactly equal, then the PRNG is very likely uniform" and "If the PRNG is uniform, then the observed distributions will be equal." The first is true, the second is oh so very false, and (at least it seemed to me as I was reading) that you were starting to head down that second path.

Well, naturally, the second statement would be very presumptuous and most likely incorrect. I think the confusion here is just a matter of terminology, where I have been using the statement "uniform distribution" in place of "observed frequency", which I realize now was clearly the wrong choice of words! So yes, basically, all I have been testing up to this point is the "overall" frequency distribution given a number of iterations, just to give me a ballpark idea how the algorithm performs. Even though we want an RNG to produce unpredictable values, we do want to make sure that it doesn't generate 65536 zeros, after all! Now of course this is just a minor test of an algorithm's "worthiness", but it can't be played down too much either.

**brewbuck** · 07-15-2009

Originally Posted by Sebastiani

That's an interesting thought, but I honestly don't grok the import of the logic there. If you don't know what all of the possibilities are, then why factor them in at all? That's just seems illogical.

Because it's inherently wrong to assume that what you've seen so far is the only thing you'll ever see. Suppose I'm trying to estimate the zeroth order probability of a given sentence by multiplying the probabilities of the individual words. Suppose somebody gives me a sentence I haven't seen before, and it includes the word "gargantuan" which I have not seen before. According to my maximum likelihood estimate, the probability of this word is zero (I've never seen it) and so that zero propogates through the entire multiplication and I get a zero probability for the sentence. Yet the probability can't be zero, because somebody gave me the sentence.

So you need a way of estimating the probability that you will see a novel event, in a way that maintains consistency of the distribution. This means you HAVE to discount some of the frequency amounts. And there are multiple ways of doing this. For instance, the Laplace estimator, the Jeffreys-Perks estimator, Lidstone, Good-Turing discounting, etc...

**Sebastiani** · 07-15-2009

>> According to my maximum likelihood estimate, the probability of this word is zero (I've never seen it) and so that zero propogates through the entire multiplication and I get a zero probability for the sentence. Yet the probability can't be zero, because somebody gave me the sentence.

The probability is unknown. You can't assign a value to that! Anyway, couldn't you just ignore the term if it was zero?

**brewbuck** · 07-15-2009

Originally Posted by Sebastiani

>> According to my maximum likelihood estimate, the probability of this word is zero (I've never seen it) and so that zero propogates through the entire multiplication and I get a zero probability for the sentence. Yet the probability can't be zero, because somebody gave me the sentence.

The probability is unknown. You can't assign a value to that! Anyway, couldn't you just ignore the term if it was zero?

The probability is unknown, but you still need to estimate it. Ignoring the term is not correct either, because suppose you have two sentences which differ only by the addition of an unseen word. You would compute these two sentences as having equal probability, although you know that the sentence with the unseen word should have lower probability, precisely BECAUSE you have not seen that word before.

The real problem here is that if you are trying to construct an info-theoretic model of something, your model needs to do something reasonable when you are presented with new data. Consider a calculation of the KL-divergence between two joint probability distributions. There may be some joint events in one distribution that do not occur in the other, just by chance. But the KL divergence is defined in terms of RATIOS of probabilities, which means that you could end up dividing by zero, giving an estimate of the divergence as "INFINITE" when you know that the distributions are actually very similar, apart from some rare event which was seen in one but not the other.

Since the KL-divergence tells you the number of bits-per-symbol of redundancy between two perfect encodings, this means that you'd require an infinite number of bits to encode an unseen symbol. But that's WRONG, because you can always invent a coding scheme that requires fewer than an infinite number of bits to encode an unseen symbol.

Sorry for spewing. It's just too common that people measure frequency counts and say that this defined the actual distribution. This is merely one way of estimating the distribution, yet people give it high status over other methods, probably because it's simple.

**Sebastiani** · 07-15-2009

>> The probability is unknown, but you still need to estimate it. Ignoring the term is not correct either, because suppose you have two sentences which differ only by the addition of an unseen word. You would compute these two sentences as having equal probability, although you know that the sentence with the unseen word should have lower probability, precisely BECAUSE you have not seen that word before.

Ah, true. Yeah, that definitely makes sense.

>> Sorry for spewing. It's just too common that people measure frequency counts and say that this defined the actual distribution. This is merely one way of estimating the distribution, yet people give it high status over other methods, probably because it's simple.

No worries, I really appreciate the clarification. I'll definitely look into these more advanced methods, eventually. They certainly appear to be powerful tools for analyzing this sort of thing.

**brewbuck** · 07-15-2009

To think of it another way, consider a perfect, uniform RNG which generates integers in the range [0, 2]. You draw three numbers from this range. The only possible combinations are:

Code:

I've marked the combinations which have uniform frequency counts -- there are 6 of them. According to these frequencies, given three draws of random numbers in the [0,2] range, only 6/27 combinations have a perfectly uniform distribution. This means that, given a combination of three values, it's more likely than not that the values drawn will be NON-uniform.

In other words, if I see a perfect set of counts, I actually have reason to believe that the RNG is not actually random, because this is actually less likely than drawing a set with unequal counts.

**CornedBee** · 07-16-2009

We're talking about pseudo-random number generators here, though, are we not? I think it's entirely reasonable to test a pseudo-random generator by generating its complete sequence and making sure that every possible number has been generated the same number of times. The fact that knowledge about the past gives you information about the future is inherent in the principle of a PRNG and is its distinguishing feature from real RNGs.
And if during its entire period, the PRNG does not emit every single number the same number of times, then it's not uniform, because if you have no knowledge of the PRNG's past, you can still use this statistic to predict that some numbers are more likely than others.

**Sebastiani** · 07-16-2009

>> We're talking about pseudo-random number generators here, though, are we not? I think it's entirely reasonable to test a pseudo-random generator by generating its complete sequence and making sure that every possible number has been generated the same number of times. The fact that knowledge about the past gives you information about the future is inherent in the principle of a PRNG and is its distinguishing feature from real RNGs.

Well exactly. Uniformity is a good indicator of the spread of values that you can expect to obtain, and the higher value of uniformity, the better. Using the UD function I posted earlier I've been getting a score of roughly 0.80 (on a scale of 0.0 - 1.0) on most good RNG's (with the Marsenne twister scoring highest, generally), which is of course acceptable. The uniformity can also tell you something about the mathematical properties of the function, as some have a good spread overall but then exhibit peaks at regular intervals, which is not a very desirable property.

My next area of focus is in analyzing the relative randomness of the function (eg: amount of change from one value to the next). I thought about using some sort of classical entropy function, but another viable option would be to use Fourier and wavelet techniques, which is what I'm working on at the moment. Should be interesting!

**brewbuck** · 07-16-2009

Originally Posted by CornedBee

We're talking about pseudo-random number generators here, though, are we not? I think it's entirely reasonable to test a pseudo-random generator by generating its complete sequence

I agree in theory, but most of the better RNGs have periods which are absurdly long, making it infeasible to test this way.