Thread: Doubts about detecting the periodicity of psuedo random functions

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708

    Doubts about detecting the periodicity of psuedo random functions

    I'm trying to measure the periodicity of a random generator, but I've run into a theoretical problem. Basically, I was going on the assumption that if a generator repeated a sequence then the period was certain. And so far, this has been the case. But now that I think of it, a sufficiently complex generator could possibly repeat itself but not actually be stuck in a periodic cycle, couldn't it? If so, the only way I can think to handle it is to increase the number of passes and return some sort of probablistic "certificate" of periodicity. Am I way off base here?
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Many fancy random number generators have more "state" to them than just one number like rand() does, and I would expect you would need for the entire state to be duplicated before you get periodic behavior.

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by tabstop View Post
    Many fancy random number generators have more "state" to them than just one number like rand() does, and I would expect you would need for the entire state to be duplicated before you get periodic behavior.
    tabstop is onto it here.
    In fact you may only have to check for the internal state of the PRNG becoming equal to a copy of it's initial internal state. That's assuming there isn't any redundancy in its internal state.
    It's enough for an LCG and would be for Mersenne twister too anyway.
    Of course you'd need complete access to the internal state of the PRNG.

  4. #4
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    >> Many fancy random number generators have more "state" to them than just one number like rand() does, and I would expect you would need for the entire state to be duplicated before you get periodic behavior.

    Yes, of course. My brain is fried right now so it just didn't seem so obvious. Thanks for setting that straight.

    Well, if nothing else I can use the function for testing purposes to let me know if a function is *not* periodic within a certain range of iterations. But then again, let's say we have a function that returns the same period 5 times in a row. Although we don't really know if it is in fact the period of the function, we could in fact make a subjective judgement about the the relative "quality" of the function, right?
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  5. #5
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Do you mean a PRNG that returns the same string of numbers with different seeds?

    Yeah. It sucks.

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by whiteflags View Post
    Do you mean a PRNG that returns the same string of numbers with different seeds?

    Yeah. It sucks.
    I'm pretty sure he means that the first number is seen again as (say) #1001, and then again at #2001, and then again at #3001, and then again at #4001, and then again at #5001, at which point the claim that the period of the prng = 1000 looks pretty good.

  7. #7
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    >> I'm pretty sure he means that the first number is seen again as (say) #1001, and then again at #2001, and then again at #3001, and then again at #4001, and then again at #5001, at which point the claim that the period of the prng = 1000 looks pretty good.

    Right, well the entire sequence inclusive. So for example:

    (27)(68)(27)(36)(27)(31)(27)(24)(27)(17)

    Even though the value 27 occurs regularly this would not be considered a periodic sequence. But now let's say we have this:

    (22)(68)(13)(27)(21)(31)(12)(83)(36)(27)

    Still fine, but now we can see that after generating the last number in the sequence that it is also present in the 4th position. So to test for periodicity, we "walk" the length of the sequence up to and including the last one shown, invoking the generator at every step and testing it against the number in that position. If we reach the end we assume that the sequence is periodic and return the length of the period. Otherwise, we append all of the numbers that were generated during the comparison onto the original sequence and continue the process. To make it more clear, this is the actual function:

    Code:
    /*
        Returns either the period of the generator or zero if none 
        was found within the maximum number of iterations allowed
    */
    template < typename Type, typename Generator >
    size_t period_of_generator( Generator generator, size_t iterations )
    {
        vector< Type >
            buffer, 
            saved;
        for( ;; )
        {
            if( iterations-- == 0 )
                return 0;    
            Type
                value = generator( );
            typename vector< Type >::const_iterator
                begin = buffer.begin( ),
                end = buffer.end( ),
                found = find( begin, end, value );
            if( found == end )
                buffer.push_back( value );
            else
            {            
                saved.push_back( value );
                while( ++found != end )
                {
                    if( iterations-- == 0 )
                        return 0;
                    value = generator( );
                    saved.push_back( value );
                    if( value != *found )
                        break;
                }
                if( found == end )
                    return saved.size( );
                buffer.insert( buffer.end( ), saved.begin( ), saved.end( ) );
                saved.clear( );
            }
        }
        return 0;
    }
    Of course I can see now that this isn't really a true test of periodicity, and it would probably be more useful if it had an option to make several passes and return "a degree of certainty". But even as is, when used on really good random number generators at several million iterations it is a reliable test of the function's *lack* of periodicity, anyway - which is helpful.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  8. #8
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    >> In fact you may only have to check for the internal state of the PRNG becoming equal to a copy of it's initial internal state. That's assuming there isn't any redundancy in its internal state. It's enough for an LCG and would be for Mersenne twister too anyway. Of course you'd need complete access to the internal state of the PRNG.

    That makes perfect sense. Thanks. Well in that case, I could probably use a function similar to this as applied to the internal state, then?

    One more question. I've read a lot about uniform distribution functions but I can't seem to wrap my head around it algorithmically. I've always had trouble translating advance mathematical nomenclature into concrete ideas, for some reason. Anyone have any insight into that?
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  9. #9
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Sebastiani View Post
    >> In fact you may only have to check for the internal state of the PRNG becoming equal to a copy of it's initial internal state. That's assuming there isn't any redundancy in its internal state. It's enough for an LCG and would be for Mersenne twister too anyway. Of course you'd need complete access to the internal state of the PRNG.

    That makes perfect sense. Thanks. Well in that case, I could probably use a function similar to this as applied to the internal state, then?

    One more question. I've read a lot about uniform distribution functions but I can't seem to wrap my head around it algorithmically. I've always had trouble translating advance mathematical nomenclature into concrete ideas, for some reason. Anyone have any insight into that?
    What exactly is the question? A uniform distribution (assuming a discrete range, which any prng has) just means that every outcome is equally likely. Any algorithm that generates (say) each output value exactly once during a period would be uniform.

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by tabstop View Post
    What exactly is the question? A uniform distribution (assuming a discrete range, which any prng has) just means that every outcome is equally likely. Any algorithm that generates (say) each output value exactly once during a period would be uniform.
    Except that perfect_uniformity == !random.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #11
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by iMalc
    Except that perfect_uniformity == !random.
    What do you mean by "perfect uniformity"?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  12. #12
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by laserlight View Post
    What do you mean by "perfect uniformity"?
    I mean that if it wont ever generate an outcome that has been seen n times already while there are other outcomes that have been seen only n-1 times, then clearly each outcome is not equally likely on any given individual outcome produced.

    You certainly can't call a generator biased if after generating n outcomes (where there are exactly n possible outcomes), that the outcomes are not all unique.
    Another way of looking at it, Given enough possible outcomes, the chance that after a given outcome generation, all outcomes have been generated an equal number of times, is remote enough that it would probably mean the RNG is cheating, isn't really unbiased, or isn't truly random.
    None of this is anything anyone here wouldn't already know, and lets not get into a discussion on true randomness.

  13. #13
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by iMalc View Post
    I mean that if it wont ever generate an outcome that has been seen n times already while there are other outcomes that have been seen only n-1 times, then clearly each outcome is not equally likely on any given individual outcome produced.
    Just because the RNG does not produce exactly equal bucket counts doesn't mean it's not perfectly uniform. The estimate of the probability distribution based on frequencies of observed events is just that -- an estimate.

    Honestly, I think you should look at the question in terms of mutual information between the bits of internal state of the RNG. The external state is not enough information to make a good decision on randomness. Unfortunately you may not have access to the internal state bits.

    I wrote an experiment last night to measure this (mutual information between external state bits) for the rand() function of my platform (Ubuntu Linux), and found that the estimate of M.I. depends on the number of iterations performed. I did not collect enough data to see if the curve eventually levels off. However, after one billion observations, the estimate of mutual information between the first bit and the higher 15 bits (I limited the experiment to 16 bits of randomness) was somewhere around 2e-6.

    For a perfect uniform random source, the MI should go to zero as the number of observations goes to infinity. 2e-6 isn't zero but it's pretty damn small (when compared to 1, the assumed entropy of each bit). Conclusion? My implementation of rand() is decent.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  14. #14
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by brewbuck View Post
    Just because the RNG does not produce exactly equal bucket counts doesn't mean it's not perfectly uniform. The estimate of the probability distribution based on frequencies of observed events is just that -- an estimate.
    Exactly.

  15. #15
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Well, it looks like there were flaws in both the uniform distribution function and the periodicity detector. But I think I've straightened them out. For the former, the problem was that I wasn't factoring in the maximum error in order to obtain a useful result. The corrected version:

    Code:
    template < typename Container >
    double uniform_distribution( Container const& data )
    {
    	double
    		one = 1,
    		result = 0, 
    		size = data.size( ),
    		count = std::accumulate( data.begin( ), data.end( ) , double( 0 ) ), 
    		expected = count / size, 
    		//maximum_error = ( size - one ) * ( expected + one ); // WRONG
    		maximum_error = ( count - expected ) + ( expected * ( size - one ) );
    	if( count == 0 )
    		return std::numeric_limits< double >::quiet_NaN( );
    	for
    	( 
    		typename Container::const_iterator seq = data.begin( ), fin = data.end( ); 
    		seq != fin; 
    		++seq 
    	)
    		result += fabs( *seq - expected );
    	return one - ( result / maximum_error );
    }
    As for the latter, it was just plain broken. The updated version is much simpler and generic, and is much more rigorous in that it doesn't just check the pattern once but repeatedly until it reaches the end:

    Code:
    template < typename Iterator >
    size_t period_of_sequence( Iterator begin, Iterator end )
    {
        for( ; begin != end; ++begin )
        {
            Iterator
                found = begin;
            while( ( found = find( found + 1, end, *begin ) ) != end )
            {
                bool
                    counting = true;
                size_t
                    count = 1;
                Iterator
                    next = begin, 
                    search = found;
                while( ++search != end )
                {
                    if( *( ++next ) != *search )
                        break;
                    if( next == found )
                        counting = false;
                    if( counting )
                        ++count;
                }             
                if( search == end && !counting )
                    return count;
            }
        }
        return 0;
    }
    
    template < typename Container >
    inline size_t period_of_sequence( Container const& data )
    {
        return period_of_sequence( data.begin( ), data.end( ) );
    }
    There are still a few cases where it could return a possible false positive. For instance if the pattern ends with two identical samples, it has no choice but to assume that the sequence repeats indefinitely. In that case, though, you could just expand the range and retest.

    >> Just because the RNG does not produce exactly equal bucket counts doesn't mean it's not perfectly uniform.

    Perfect uniformity *by definition* means a completely equal distribution of samples, doesn't it? Anyway, you wouldn't want it to be totally uniform but within a certain range. And at any rate, a test of uniformity has a limited relevance when considering the strengths and weakenesses of an RNG.

    >> For a perfect uniform random source, the MI should go to zero as the number of observations goes to infinity. 2e-6 isn't zero but it's pretty damn small (when compared to 1, the assumed entropy of each bit). Conclusion? My implementation of rand() is decent.

    Interesting. I haven't gotten quite that far yet, but it sounds like a useful test. I'm guessing that Ubuntu probably uses the Marsenne twister?

    >> Another way of looking at it, Given enough possible outcomes, the chance that after a given outcome generation, all outcomes have been generated an equal number of times, is remote enough that it would probably mean the RNG is cheating, isn't really unbiased, or isn't truly random.

    That's a good point. Would a good test for that be to sample the uniform distribution at certain intervals and calculate the relative entropy of the output?
    Last edited by Sebastiani; 07-14-2009 at 10:40 PM. Reason: corrected UD function
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. time Delays and Random functions
    By markphaser in forum C++ Programming
    Replies: 17
    Last Post: 02-20-2006, 07:10 PM
  2. random functions, is it possible?
    By sreetvert83 in forum C++ Programming
    Replies: 8
    Last Post: 09-02-2005, 11:50 AM
  3. How do I restart a random number sequence.
    By jeffski in forum C Programming
    Replies: 6
    Last Post: 05-29-2003, 02:40 PM
  4. Random function's application:Dice Roller
    By angeljicu in forum C Programming
    Replies: 0
    Last Post: 11-02-2002, 05:29 AM
  5. random functions
    By bluehead in forum C++ Programming
    Replies: 5
    Last Post: 11-09-2001, 04:04 AM