# Thread: A new random number generator!

1. Originally Posted by Sebastiani
I'm just fuming here, but just to give you an idea of how badly implemented the NIST code is:
You know, it's a poor workman who blames his tools

2. >> You know, it's a poor workman who blames his tools

Yeah well in this case they're like cheap chinese knock-offs! If the nail breaks the hammer then you're probaby in trouble, methinks. =/

3. Originally Posted by EVOEx
Write it as binary to an output file (big endian, little endian, whatever you like)
Repeat number_of_bytes / 4 times.
That would give you the size you want, right?
The problem with that tactic is that you're invoking the generator more times than would actually be required had it been the proper bit-width, and assembling a larger stream of bits from a smaller one is probably going to botch the result since the smaller one has a much shorter period, etc.

4. Ok, so I've run a good number of tests on the generator (gave up on the NIST suite but found a decent one here that was much easier to use) and it appears to perform well (I'm a bit suprised, actually). I've also made some minor changes to the algorithm to fix the initial convergence problem that Evoex pointed out (with no cost to performance, luckily) and confirmed it's period and distribution. I've been wanting to put together some decent templated test functions but so far haven't really had much time to work on them. If anyone is interested on working on some feel free to post them here. =)

Here are the ones I have so far:

Code:
```template < typename Iterator >
Iterator find_period( Iterator begin, Iterator end, size_t* period = 0 )
{
for( ; begin != end; ++begin )
{
Iterator
found = begin;
while( ( found = find( found + 1, end, *begin ) ) != end )
{
bool
counting = true;
size_t
count = 1;
Iterator
next = begin,
search = found;
for( ;; )
{
if( ++next == found )
counting = false;
if( ++search == end )
break;
if( *next != *search )
break;
if( counting )
++count;
}
if( search == end && !counting )
{
if( period )
*period = count;
return begin;
}
}
}
if( period )
*period = 0;
return end;
}

template < typename Container >
inline typename Container::const_iterator find_period( Container const& data, size_t* period = 0 )
{
return find_period( data.begin( ), data.end( ), period );
}

template < typename Iterator >
size_t length_of_period( Iterator begin, Iterator end )
{
size_t
period;
find_period( begin, end, &period );
return period;
}

template < typename Container >
inline size_t length_of_period( Container const& data )
{
return length_of_period( data.begin( ), data.end( ) );
}

template < typename Iterator >
double frequency_distribution( Iterator seq, Iterator fin )
{
double
result = 0.0,
size = std::distance( seq, fin ),
count = std::accumulate( seq, fin, 0.0 ),
expected = count / size,
maximum_error = ( count - expected ) + ( expected * ( size - 1.0 ) );
if( count == 0.0 )
return 0.0;
while( seq != fin )
result += fabs( *seq++ - expected );
return 1.0 - ( result / maximum_error );
}

template < typename Container >
inline double frequency_distribution( Container const& data )
{
return frequency_distribution( data.begin( ), data.end( ) );
}

template < typename Iterator >
double monte_carlo_pi( Iterator seq, Iterator fin )
{
size_t
size = 0,
count = 0;
double const
pi = 4.0  * atan( 1.0 ),
factor = 1.0 / pow( 2.0, int( sizeof( *seq ) * numeric_limits< char >::digits ) );
for( ;; )
{
if( seq == fin )
break;
double
x = double( *seq++ ) * factor;
if( seq == fin )
break;
double
y = double( *seq++ ) * factor;
if( ( ( x * x ) + ( y * y ) ) <= 1.0 )
++count;
++size;
}
return fabs( ( 4.0 * ( double( count ) / double( size ) ) ) - pi ) / pi;
}

template < typename Container >
inline double monte_carlo_pi( Container const& data )
{
return monte_carlo_pi( data.begin( ), data.end( ) );
}

template < typename Iterator >
double chi_squared( Iterator seq, Iterator fin )
{
double
result = 0.0,
size = std::distance( seq, fin ),
count = std::accumulate( seq, fin, 0.0 );
if( count == 0.0 )
return 0.0;
double
expected = count / size;
while( seq != fin )
{
double
temp = ( *seq++ - expected );
result += ( temp * temp ) / expected;
}
return result;
}

template < typename Container >
inline double chi_squared( Container const& data )
{
return chi_squared( data.begin( ), data.end( ) );
}```
[EDIT]

Note: All of the above functions expect unsigned integers.

[/EDIT]

I'm pretty sure the chi-squared algorithm is correct, but as yet haven't been able to make much sense of the results. As far as periodicity, frequency distribution, and the monte-carlo pi test, both the generator and the marsenne twister performed exceedingly well. The standard library rand function did ok except for the latter test.

And here are the results of the Fourmilab "ent" test. I chose the absolute worst performers of numerous runs.

--- std::rand ---

Value Char Occurrences Fraction
0 17120290 0.531920
1 15065534 0.468080

Total: 32185824 1.000000

Entropy = 0.997058 bits per bit.

Optimum compression would reduce the size
of this 32185824 bit file by 0 percent.

Chi square distribution for 32185824 samples is 131176.45, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bits is 0.4681 (0.5 = random).
Monte Carlo value for Pi is 3.830697142 (error 21.93 percent).
Serial correlation coefficient is -0.003975 (totally uncorrelated = 0.0).

--- marsenne ---

Value Char Occurrences Fraction
0 16078169 0.500499
1 16046079 0.499501

Total: 32124248 1.000000

Entropy = 0.999999 bits per bit.

Optimum compression would reduce the size
of this 32124248 bit file by 0 percent.

Chi square distribution for 32124248 samples is 32.06, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bits is 0.4995 (0.5 = random).
Monte Carlo value for Pi is 3.148647377 (error 0.22 percent).
Serial correlation coefficient is -0.000144 (totally uncorrelated = 0.0).

--- garth ---

Value Char Occurrences Fraction
0 16079544 0.500566
1 16043184 0.499434

Total: 32122728 1.000000

Entropy = 0.999999 bits per bit.

Optimum compression would reduce the size
of this 32122728 bit file by 0 percent.

Chi square distribution for 32122728 samples is 41.16, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bits is 0.4994 (0.5 = random).
Monte Carlo value for Pi is 3.147524816 (error 0.19 percent).
Serial correlation coefficient is -0.000060 (totally uncorrelated = 0.0).

--- placebo ---

Value Char Occurrences Fraction
0 27000000 0.843750
1 5000000 0.156250

Total: 32000000 1.000000

Entropy = 0.625262 bits per bit.

Optimum compression would reduce the size
of this 32000000 bit file by 37 percent.

Chi square distribution for 32000000 samples is 15125000.00, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bits is 0.1563 (0.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is 0.525926 (totally uncorrelated = 0.0).
The 'placebo' function was just a control case that always returns the same 32-bit number (as a sort of base-line worst-case scenario). All of the tests were performed on 1,000,000 32-bit numbers, except for std::rand, which was done on 2,000,000 16-bit numbers, and each set of data was treated as a stream of 32,000,000 bits.

So the results look pretty good, but I'm still not totally convinced. I'm wondering what else I should be checking for?

-

5. Well, I would post a working copy of my proprietary RNG, but it wont let me upload rar files.

Anyway, non-trivial PRNG's exist, but anything based on a 64 bit number isn't going to be highly random.