# Thread: A new random number generator!

1. ## A new random number generator!

Well, I've learned a lot about random numbers this week. Having tried almost a dozen different designs I've come to the conclusion that designing a good generator isn't at all trivial! But it was fun, anyway.

So here's what I came up with. It's incredibly simple, actually, and yet it has some pretty interesting properties:

- Bit width neutral. No matter how many bits in the underlying type, the algorithm is identical and requires no "magic" numbers.
- Invariant period. Regardless of the seed, the period is exactly 4^N, where N is the number of bits in the underlying type. For example, using 32-bit integers the period would be 18,446,744,073,709,551,616, and with a 64-bit type it would be roughly 3.4e+38.
- Good distribution. Over the length of it's period the distribution averages about 82%. Using different cycle lengths the value obtained was generally comparable to other popular algorithms.
- Extremely fast. Each value generated only requires 1 multiply, 4 additions, 4 bit manipulations, and 2 compares. An extra addition is required every 2^N iterations.
- Low overhead. Only 3 words of memory required.

There are, however, some possible disadvantages:

- Due to the simplicity of the operations, there may be some inherent mathematical weaknesses in the context of certain applications.
- Probably not suitable for applications requiring cryptographically secure random numbers (for the above reasons).

If nothing else, though, the algorithm might be a useful component for more complex multiphase generators.

Here is a side-by-side cross-section of a plot of the algorithm compared with the Marsenne twister (using 32768 and 131072 samples, respectively). Can you tell which is which?

Anyway, I would be interested to hear some opinions from the experts, and would like to know if it has any serious flaws?

2. Wow, that code is scarily skinny, I can understand your concern of it's 'randomness'. But, I don't think a potential flaw found be found in the screenies (unless it's really bad). I guess yours is on the left?

3. Sorry, but it's horrible. Let me write the formula you use here:

Code:
```X(1) = seed + ~(((1*seed) + 1) ROL 1)
X(2) = X(1) + ~(((2*X(1)) + 1) ROL 1)```
ROL means rotate left. So, let's describe the formula better:

Code:
```X(0) = seed
X(n) = X(n-1) + ~(((n*X(n-1)) + 1) ROL 1)```
That means that given a number, the next number is COMPLETELY independent of the seed. Given a number, the next number can be calculated by simply filling in the formula if you know how many numbers have been generated already. If not, you need two to predict the rest.

Not only that, but it shouldn't be hard to find the seed either if you get the first number.

4. Does it pass the DieHard tests?
Diehard tests - Wikipedia, the free encyclopedia

5. >> Sorry, but it's horrible. Let me write the formula you use here:

How so? As in "horribly simple"? My main goal was to make an algorithm that had predictable, generic properties that worked on any number of bits. I believe I've achieved that.

>> Given a number, the next number can be calculated by simply filling in the formula if you know how many numbers have been generated already. Not only that, but it shouldn't be hard to find the seed either if you get the first number.

I think that falls under the second category of "disadvantages" that I listed. Security was not my focus here at all. The types of flaws I am concerned with are lapses in periodicity, bad distribution, and evidence that it is not in fact "bit width neutral".

>> If not, you need two to predict the rest.

Are you sure? Can you guess what follows these two numbers?

4290416596
806034681

>> But, I don't think a potential flaw found be found in the screenies (unless it's really bad).

Well, yes and no. Plotting the values is on the one hand a suprisingly effective tool for evaluating the basic properties of a generator, since our eyes are remarkably good at recognizing patterns (or the lack thereof). On the other hand, there are certainly subtle mathematical relationships that would be missed using this "method" and a more rigorous analysis would be necessary to find them.

>> Let's see what NIST thinks about it

Good idea. I had gotten a hold of one of their programs recently only to find that it had a very primitive interface and was difficult to work with, but the package you linked to looks like it might be a newer one, so I think I'll give it a shot.

6. >> Does it pass the DieHard tests?

Unfortunately, it only compiles on 'nix systems. But I have heard that it's one of the best.

7. Code:
```X(n-1) = 4290416596
x(n) = 806034681
X(n) = 4290416596 + ~(((n*4290416596) + 1) ROL 1)
810585381 = ~(((n*4290416596) + 1) ROL 1)
3484381914 = (((n*4290416596) + 1) ROL 1)
1742190957 = ((n*4290416596) + 1)
1742190956 = (n*4290416596)
n = 9999
x(10000) = X(9999) + ~(((10000*X(9999)) + 1) ROL 1)
x(10000) = 806034681 + ~(((10000*806034681) + 1) ROL 1)
x(10000) = 3419643861```
So the next number should be 3419643861 if I haven't made any mistakes.

I might even be able to find out the seed... I'll post if I manage to do it. [Edit: never mind, too much work :P]

Distribution isn't too great either: For the first several numbers, if X(n) is low, X(n+1) is likely to be very high.

8. >> So the next number should be 3419643861 if I haven't made any mistakes.

You didn't. =)

>> Distribution isn't too great either: For the first several numbers, if X(n) is low, X(n+1) is likely to be very high.

I was aware of that but didn't want to muddle with it at this point. At any rate, I'm running the NIST tests right now, so hopefully that'll help shed some light on the more important mathematical properties of the generator.

9. Ok, well the one thing I have established at this point is that the NIST test suite sucks! You can't automate anything and the input format is really wonky (you have to convert the input to binary ASCII strings and replace any CRLF with LF). So far it's choked on all of the files I've given it. Bleh. I guess I'll try binary mode (I wonder what endian mode it expects?).

10. Originally Posted by Sebastiani
I wonder what endian mode it expects?
It doesn't matter. Otherwise the random number generator could simply modify its endianess and be more (or less) random, which is of course nonsense.

11. >> It doesn't matter. Otherwise the random number generator could simply modify its endianess and be more (or less) random, which is of course nonsense.

Good point.

12. Well it looks like the problem I was having with the tests was an underflow issue - I was a few hundred thousand bits shy of the recommended sample size! I thought of just running the generator as is and have the test suite "think" it was a longer sample size, but that probably wouldn't yield accurate results, would it? I'll probably need to use a big integer but then I don't know how to modify the marsenne twister algorithm (to use as a comparison) for such a large data type. Any ideas?

13. Originally Posted by Sebastiani
Well it looks like the problem I was having with the tests was an underflow issue - I was a few hundred thousand bits shy of the recommended sample size! I thought of just running the generator as is and have the test suite "think" it was a longer sample size, but that probably wouldn't yield accurate results, would it? I'll probably need to use a big integer but then I don't know how to modify the marsenne twister algorithm (to use as a comparison) for such a large data type. Any ideas?
Write it as binary to an output file (big endian, little endian, whatever you like)
Repeat number_of_bytes / 4 times.
That would give you the size you want, right?

14. [rant]

I'm just fuming here, but just to give you an idea of how badly implemented the NIST code is:

Code:
```	if ( ((X = (double*) calloc(n,sizeof(double))) == NULL) ||
((wsave = (double *)calloc(2*n+15,sizeof(double))) == NULL) ||
((ifac = (double *)calloc(15,sizeof(double))) == NULL) ||
((m = (double*)calloc(n/2+1, sizeof(double))) == NULL) ) {
fprintf(stats[7],"\t\tUnable to allocate working arrays for the DFT.\n");
if( X == NULL )
free(X);
if( wsave == NULL )
free(wsave);
if( ifac == NULL )
free(ifac);
if( m == NULL )
free(m);
return;
}```
Can we say "memory leak" (although "hemorrhage" might be more appropriate, considering the sizes involved)?

Besides that, the programmer seemed way too comfortable with using goto's, including one bit where he had a goto jump directly to another goto instead of directly to the target!

Oh and did I mention the so-called user-interface? It's menu based and honestly, I've seen better systems implemented by the newb's that post their homework assignments here!

And then there are the functions that modify global variables in ways that can have side-effects on other unsuspecting functions unless special steps are taken.

Oh the humanity!

[/rant]

Ok, I'm done. =D