Doubts about detecting the periodicity of psuedo random functions

**Sebastiani** · 07-12-2009

I'm trying to measure the periodicity of a random generator, but I've run into a theoretical problem. Basically, I was going on the assumption that if a generator repeated a sequence then the period was certain. And so far, this has been the case. But now that I think of it, a sufficiently complex generator could possibly repeat itself but not actually be stuck in a periodic cycle, couldn't it? If so, the only way I can think to handle it is to increase the number of passes and return some sort of probablistic "certificate" of periodicity. Am I way off base here?

**tabstop** · 07-12-2009

Many fancy random number generators have more "state" to them than just one number like rand() does, and I would expect you would need for the entire state to be duplicated before you get periodic behavior.

**iMalc** · 07-13-2009

Originally Posted by tabstop

Many fancy random number generators have more "state" to them than just one number like rand() does, and I would expect you would need for the entire state to be duplicated before you get periodic behavior.

tabstop is onto it here.
In fact you may only have to check for the internal state of the PRNG becoming equal to a copy of it's initial internal state. That's assuming there isn't any redundancy in its internal state.
It's enough for an LCG and would be for Mersenne twister too anyway.
Of course you'd need complete access to the internal state of the PRNG.

**Sebastiani** · 07-12-2009

>> Many fancy random number generators have more "state" to them than just one number like rand() does, and I would expect you would need for the entire state to be duplicated before you get periodic behavior.

Yes, of course. My brain is fried right now so it just didn't seem so obvious. Thanks for setting that straight.

Well, if nothing else I can use the function for testing purposes to let me know if a function is *not* periodic within a certain range of iterations. But then again, let's say we have a function that returns the same period 5 times in a row. Although we don't really know if it is in fact the period of the function, we could in fact make a subjective judgement about the the relative "quality" of the function, right?

**whiteflags** · 07-12-2009

Do you mean a PRNG that returns the same string of numbers with different seeds?

Yeah. It sucks.

**tabstop** · 07-12-2009

Originally Posted by whiteflags

Do you mean a PRNG that returns the same string of numbers with different seeds?

Yeah. It sucks.

I'm pretty sure he means that the first number is seen again as (say) #1001, and then again at #2001, and then again at #3001, and then again at #4001, and then again at #5001, at which point the claim that the period of the prng = 1000 looks pretty good.

**Sebastiani** · 07-12-2009

>> I'm pretty sure he means that the first number is seen again as (say) #1001, and then again at #2001, and then again at #3001, and then again at #4001, and then again at #5001, at which point the claim that the period of the prng = 1000 looks pretty good.

Right, well the entire sequence inclusive. So for example:

(27)(68)(27)(36)(27)(31)(27)(24)(27)(17)

Even though the value 27 occurs regularly this would not be considered a periodic sequence. But now let's say we have this:

(22)(68)(13)(27)(21)(31)(12)(83)(36)(27)

Still fine, but now we can see that after generating the last number in the sequence that it is also present in the 4th position. So to test for periodicity, we "walk" the length of the sequence up to and including the last one shown, invoking the generator at every step and testing it against the number in that position. If we reach the end we assume that the sequence is periodic and return the length of the period. Otherwise, we append all of the numbers that were generated during the comparison onto the original sequence and continue the process. To make it more clear, this is the actual function:

Code:

/*
    Returns either the period of the generator or zero if none 
    was found within the maximum number of iterations allowed
*/
template < typename Type, typename Generator >
size_t period_of_generator( Generator generator, size_t iterations )
{
    vector< Type >
        buffer, 
        saved;
    for( ;; )
    {
        if( iterations-- == 0 )
            return 0;    
        Type
            value = generator( );
        typename vector< Type >::const_iterator
            begin = buffer.begin( ),
            end = buffer.end( ),
            found = find( begin, end, value );
        if( found == end )
            buffer.push_back( value );
        else
        {            
            saved.push_back( value );
            while( ++found != end )
            {
                if( iterations-- == 0 )
                    return 0;
                value = generator( );
                saved.push_back( value );
                if( value != *found )
                    break;
            }
            if( found == end )
                return saved.size( );
            buffer.insert( buffer.end( ), saved.begin( ), saved.end( ) );
            saved.clear( );
        }
    }
    return 0;
}

Of course I can see now that this isn't really a true test of periodicity, and it would probably be more useful if it had an option to make several passes and return "a degree of certainty". But even as is, when used on really good random number generators at several million iterations it is a reliable test of the function's *lack* of periodicity, anyway - which is helpful.

**Sebastiani** · 07-13-2009

>> In fact you may only have to check for the internal state of the PRNG becoming equal to a copy of it's initial internal state. That's assuming there isn't any redundancy in its internal state. It's enough for an LCG and would be for Mersenne twister too anyway. Of course you'd need complete access to the internal state of the PRNG.

That makes perfect sense. Thanks. Well in that case, I could probably use a function similar to this as applied to the internal state, then?

One more question. I've read a lot about uniform distribution functions but I can't seem to wrap my head around it algorithmically. I've always had trouble translating advance mathematical nomenclature into concrete ideas, for some reason. Anyone have any insight into that?

**tabstop** · 07-13-2009

Originally Posted by Sebastiani

>> In fact you may only have to check for the internal state of the PRNG becoming equal to a copy of it's initial internal state. That's assuming there isn't any redundancy in its internal state. It's enough for an LCG and would be for Mersenne twister too anyway. Of course you'd need complete access to the internal state of the PRNG.

That makes perfect sense. Thanks. Well in that case, I could probably use a function similar to this as applied to the internal state, then?

One more question. I've read a lot about uniform distribution functions but I can't seem to wrap my head around it algorithmically. I've always had trouble translating advance mathematical nomenclature into concrete ideas, for some reason. Anyone have any insight into that?

What exactly is the question? A uniform distribution (assuming a discrete range, which any prng has) just means that every outcome is equally likely. Any algorithm that generates (say) each output value exactly once during a period would be uniform.

**iMalc** · 07-13-2009

Originally Posted by tabstop

What exactly is the question? A uniform distribution (assuming a discrete range, which any prng has) just means that every outcome is equally likely. Any algorithm that generates (say) each output value exactly once during a period would be uniform.

Except that perfect_uniformity == !random.

**laserlight** · 07-13-2009

Originally Posted by iMalc

Except that perfect_uniformity == !random.

What do you mean by "perfect uniformity"?

**iMalc** · 07-14-2009

Originally Posted by laserlight

What do you mean by "perfect uniformity"?

I mean that if it wont ever generate an outcome that has been seen n times already while there are other outcomes that have been seen only n-1 times, then clearly each outcome is not equally likely on any given individual outcome produced.

You certainly can't call a generator biased if after generating n outcomes (where there are exactly n possible outcomes), that the outcomes are not all unique.
Another way of looking at it, Given enough possible outcomes, the chance that after a given outcome generation, all outcomes have been generated an equal number of times, is remote enough that it would probably mean the RNG is cheating, isn't really unbiased, or isn't truly random.
None of this is anything anyone here wouldn't already know, and lets not get into a discussion on true randomness.

**brewbuck** · 07-14-2009

Originally Posted by iMalc

I mean that if it wont ever generate an outcome that has been seen n times already while there are other outcomes that have been seen only n-1 times, then clearly each outcome is not equally likely on any given individual outcome produced.

Just because the RNG does not produce exactly equal bucket counts doesn't mean it's not perfectly uniform. The estimate of the probability distribution based on frequencies of observed events is just that -- an estimate.

Honestly, I think you should look at the question in terms of mutual information between the bits of internal state of the RNG. The external state is not enough information to make a good decision on randomness. Unfortunately you may not have access to the internal state bits.

I wrote an experiment last night to measure this (mutual information between external state bits) for the rand() function of my platform (Ubuntu Linux), and found that the estimate of M.I. depends on the number of iterations performed. I did not collect enough data to see if the curve eventually levels off. However, after one billion observations, the estimate of mutual information between the first bit and the higher 15 bits (I limited the experiment to 16 bits of randomness) was somewhere around 2e-6.

For a perfect uniform random source, the MI should go to zero as the number of observations goes to infinity. 2e-6 isn't zero but it's pretty damn small (when compared to 1, the assumed entropy of each bit). Conclusion? My implementation of rand() is decent.

**iMalc** · 07-14-2009

Originally Posted by brewbuck

Just because the RNG does not produce exactly equal bucket counts doesn't mean it's not perfectly uniform. The estimate of the probability distribution based on frequencies of observed events is just that -- an estimate.

Exactly.

**Sebastiani** · 07-14-2009

Well, it looks like there were flaws in both the uniform distribution function and the periodicity detector. But I think I've straightened them out. For the former, the problem was that I wasn't factoring in the maximum error in order to obtain a useful result. The corrected version:

Code:

template < typename Container >
double uniform_distribution( Container const& data )
{
	double
		one = 1,
		result = 0, 
		size = data.size( ),
		count = std::accumulate( data.begin( ), data.end( ) , double( 0 ) ), 
		expected = count / size, 
		//maximum_error = ( size - one ) * ( expected + one ); // WRONG
		maximum_error = ( count - expected ) + ( expected * ( size - one ) );
	if( count == 0 )
		return std::numeric_limits< double >::quiet_NaN( );
	for
	( 
		typename Container::const_iterator seq = data.begin( ), fin = data.end( ); 
		seq != fin; 
		++seq 
	)
		result += fabs( *seq - expected );
	return one - ( result / maximum_error );
}

As for the latter, it was just plain broken.

The updated version is much simpler and generic, and is much more rigorous in that it doesn't just check the pattern once but repeatedly until it reaches the end:

Code:

template < typename Iterator >
size_t period_of_sequence( Iterator begin, Iterator end )
{
    for( ; begin != end; ++begin )
    {
        Iterator
            found = begin;
        while( ( found = find( found + 1, end, *begin ) ) != end )
        {
            bool
                counting = true;
            size_t
                count = 1;
            Iterator
                next = begin, 
                search = found;
            while( ++search != end )
            {
                if( *( ++next ) != *search )
                    break;
                if( next == found )
                    counting = false;
                if( counting )
                    ++count;
            }             
            if( search == end && !counting )
                return count;
        }
    }
    return 0;
}

template < typename Container >
inline size_t period_of_sequence( Container const& data )
{
    return period_of_sequence( data.begin( ), data.end( ) );
}

There are still a few cases where it could return a possible false positive. For instance if the pattern ends with two identical samples, it has no choice but to assume that the sequence repeats indefinitely. In that case, though, you could just expand the range and retest.

>> Just because the RNG does not produce exactly equal bucket counts doesn't mean it's not perfectly uniform.

Perfect uniformity *by definition* means a completely equal distribution of samples, doesn't it? Anyway, you wouldn't want it to be totally uniform but within a certain range. And at any rate, a test of uniformity has a limited relevance when considering the strengths and weakenesses of an RNG.

>> For a perfect uniform random source, the MI should go to zero as the number of observations goes to infinity. 2e-6 isn't zero but it's pretty damn small (when compared to 1, the assumed entropy of each bit). Conclusion? My implementation of rand() is decent.

Interesting. I haven't gotten quite that far yet, but it sounds like a useful test. I'm guessing that Ubuntu probably uses the Marsenne twister?

>> Another way of looking at it, Given enough possible outcomes, the chance that after a given outcome generation, all outcomes have been generated an equal number of times, is remote enough that it would probably mean the RNG is cheating, isn't really unbiased, or isn't truly random.

That's a good point. Would a good test for that be to sample the uniform distribution at certain intervals and calculate the relative entropy of the output?

Thread: Doubts about detecting the periodicity of psuedo random functions

Thread Tools

Search Thread

Display

Hybrid View

Doubts about detecting the periodicity of psuedo random functions

Similar Threads

time Delays and Random functions

random functions, is it possible?

How do I restart a random number sequence.

Random function's application:Dice Roller

random functions