A new random number generator!

**MK27** · 07-20-2009

Originally Posted by Sebastiani

I'm just fuming here, but just to give you an idea of how badly implemented the NIST code is:

You know, it's a poor workman who blames his tools

**Sebastiani** · 07-20-2009

>> You know, it's a poor workman who blames his tools

Yeah well in this case they're like cheap chinese knock-offs! If the nail breaks the hammer then you're probaby in trouble, methinks. =/

**Sebastiani** · 07-20-2009

Originally Posted by EVOEx

Read an integer
Write it as binary to an output file (big endian, little endian, whatever you like)
Repeat number_of_bytes / 4 times.
That would give you the size you want, right?

The problem with that tactic is that you're invoking the generator more times than would actually be required had it been the proper bit-width, and assembling a larger stream of bits from a smaller one is probably going to botch the result since the smaller one has a much shorter period, etc.

**Sebastiani** · 07-26-2009

Ok, so I've run a good number of tests on the generator (gave up on the NIST suite but found a decent one here that was much easier to use) and it appears to perform well (I'm a bit suprised, actually). I've also made some minor changes to the algorithm to fix the initial convergence problem that Evoex pointed out (with no cost to performance, luckily) and confirmed it's period and distribution. I've been wanting to put together some decent templated test functions but so far haven't really had much time to work on them. If anyone is interested on working on some feel free to post them here. =)

Here are the ones I have so far:

Code:

template < typename Iterator >
Iterator find_period( Iterator begin, Iterator end, size_t* period = 0 )
{
    for( ; begin != end; ++begin )
    {
        Iterator
            found = begin;
        while( ( found = find( found + 1, end, *begin ) ) != end )
        {
            bool
                counting = true;
            size_t
                count = 1;
            Iterator
                next = begin,
                search = found;
            for( ;; )
            {
                if( ++next == found )
                    counting = false;
                if( ++search == end )
                    break;
                if( *next != *search )
                    break;
                if( counting )
                    ++count;
            }             
            if( search == end && !counting )
            {
                if( period )
                    *period = count;
                return begin;
            }   
        }
    }
    if( period )
        *period = 0;
    return end;
}

template < typename Container >
inline typename Container::const_iterator find_period( Container const& data, size_t* period = 0 )
{
    return find_period( data.begin( ), data.end( ), period );
}

template < typename Iterator >
size_t length_of_period( Iterator begin, Iterator end )
{
    size_t
        period;
    find_period( begin, end, &period );
    return period;
}

template < typename Container >
inline size_t length_of_period( Container const& data )
{
    return length_of_period( data.begin( ), data.end( ) );
}

template < typename Iterator >
double frequency_distribution( Iterator seq, Iterator fin )
{
    double
        result = 0.0,
        size = std::distance( seq, fin ),
        count = std::accumulate( seq, fin, 0.0 ),
        expected = count / size,
        maximum_error = ( count - expected ) + ( expected * ( size - 1.0 ) );
    if( count == 0.0 )
        return 0.0;
    while( seq != fin )
        result += fabs( *seq++ - expected );
    return 1.0 - ( result / maximum_error );
}

template < typename Container >
inline double frequency_distribution( Container const& data )
{
    return frequency_distribution( data.begin( ), data.end( ) );
}

template < typename Iterator >
double monte_carlo_pi( Iterator seq, Iterator fin )
{
    size_t
        size = 0, 
        count = 0;
    double const
        pi = 4.0  * atan( 1.0 ),
        factor = 1.0 / pow( 2.0, int( sizeof( *seq ) * numeric_limits< char >::digits ) );
    for( ;; )
    {
        if( seq == fin )
            break;
        double
            x = double( *seq++ ) * factor;
        if( seq == fin )
            break;
        double
            y = double( *seq++ ) * factor;
        if( ( ( x * x ) + ( y * y ) ) <= 1.0 )
            ++count;
        ++size;    
    }
    return fabs( ( 4.0 * ( double( count ) / double( size ) ) ) - pi ) / pi;
}

template < typename Container >
inline double monte_carlo_pi( Container const& data )
{
    return monte_carlo_pi( data.begin( ), data.end( ) );
}

template < typename Iterator >
double chi_squared( Iterator seq, Iterator fin )
{
    double
        result = 0.0, 
        size = std::distance( seq, fin ),
        count = std::accumulate( seq, fin, 0.0 );
    if( count == 0.0 )
        return 0.0;
    double
        expected = count / size;
    while( seq != fin )
    {
        double
            temp = ( *seq++ - expected );
        result += ( temp * temp ) / expected;
    }    
    return result;
}

template < typename Container >
inline double chi_squared( Container const& data )
{
    return chi_squared( data.begin( ), data.end( ) );
}

[EDIT]

Note: All of the above functions expect unsigned integers.

[/EDIT]

I'm pretty sure the chi-squared algorithm is correct, but as yet haven't been able to make much sense of the results. As far as periodicity, frequency distribution, and the monte-carlo pi test, both the generator and the marsenne twister performed exceedingly well. The standard library rand function did ok except for the latter test.

And here are the results of the Fourmilab "ent" test. I chose the absolute worst performers of numerous runs.

--- std::rand ---

Value Char Occurrences Fraction
0 17120290 0.531920
1 15065534 0.468080

Total: 32185824 1.000000

Entropy = 0.997058 bits per bit.

Optimum compression would reduce the size
of this 32185824 bit file by 0 percent.

Chi square distribution for 32185824 samples is 131176.45, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bits is 0.4681 (0.5 = random).
Monte Carlo value for Pi is 3.830697142 (error 21.93 percent).
Serial correlation coefficient is -0.003975 (totally uncorrelated = 0.0).

--- marsenne ---

Value Char Occurrences Fraction
0 16078169 0.500499
1 16046079 0.499501

Total: 32124248 1.000000

Entropy = 0.999999 bits per bit.

Optimum compression would reduce the size
of this 32124248 bit file by 0 percent.

Chi square distribution for 32124248 samples is 32.06, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bits is 0.4995 (0.5 = random).
Monte Carlo value for Pi is 3.148647377 (error 0.22 percent).
Serial correlation coefficient is -0.000144 (totally uncorrelated = 0.0).

--- garth ---

Value Char Occurrences Fraction
0 16079544 0.500566
1 16043184 0.499434

Total: 32122728 1.000000

Entropy = 0.999999 bits per bit.

Optimum compression would reduce the size
of this 32122728 bit file by 0 percent.

Chi square distribution for 32122728 samples is 41.16, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bits is 0.4994 (0.5 = random).
Monte Carlo value for Pi is 3.147524816 (error 0.19 percent).
Serial correlation coefficient is -0.000060 (totally uncorrelated = 0.0).

--- placebo ---

Value Char Occurrences Fraction
0 27000000 0.843750
1 5000000 0.156250

Total: 32000000 1.000000

Entropy = 0.625262 bits per bit.

Optimum compression would reduce the size
of this 32000000 bit file by 37 percent.

Chi square distribution for 32000000 samples is 15125000.00, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bits is 0.1563 (0.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is 0.525926 (totally uncorrelated = 0.0).

The 'placebo' function was just a control case that always returns the same 32-bit number (as a sort of base-line worst-case scenario). All of the tests were performed on 1,000,000 32-bit numbers, except for std::rand, which was done on 2,000,000 16-bit numbers, and each set of data was treated as a stream of 32,000,000 bits.

So the results look pretty good, but I'm still not totally convinced. I'm wondering what else I should be checking for?

-

**abachler** · 07-30-2009

Well, I would post a working copy of my proprietary RNG, but it wont let me upload rar files.

Anyway, non-trivial PRNG's exist, but anything based on a 64 bit number isn't going to be highly random.

Thread: A new random number generator!

Thread Tools

Search Thread

Display

Similar Threads

Issue w/ Guess My Number Program

Good Random Number Generator

How do I restart a random number sequence.

how to link random number generator with cpu?

Seeding Random Number Generator