integer arrays

**iMalc** · 08-12-2010

Originally Posted by johnggold

It's clear that you don't have my experience, because you are fixated with randomness and do not consider balance and the importance of accuracy to the application.

I've been programing for 21 years. I only didn't go into detail of the other ways in which is sucks because I don't have all day.

When using a weighted sample balance is much more important. Balance in this case is crucial, as if there is any bias - which rand() is notorious for - you will not get a good result on small samples.

rand() is not notorious for distribution bias, and what you posted is definitely far worse, as pianorain has shown. The bias you tend to get from rand() comes from using 'mod' to restrict the range instead of a statistically unbiased technique such as the "rejection method". The code you posted has the flawed 'mod' method built right in, so distribution bias is also built in, and this too is obvious at a glance.

Have you tested your routine against mine for balance. Probably not.

"Balance" is not a property of a random number generator - see my next post.

I assume that you have heard of the drunken man's walk problem. If not, go and find out.

Sure. This page has a good description of it: http://samrizvi.wordpress.com/ After reading that I realise that I had heard of that before but didn't put the name to the description. Were you wanting to refer to something about it or just dropping some jargon?

I think it is highly presumptuous to assume that because a routine is old, it is necessarily bettered by more modern algorithms, and it worries me greatly that you think that change for changes sake is a good idea.

I agree that it would be highly presumptious. I however are making no such presumtpions. I have already analysed the source of the pathetic LCRNG and can tell how some of its shortfalls just by looking at the code.

It is particularly presumptuous to assume that we don't check out the latest algorithms all the time.

If you'd investigated anything from the last 10 years then you'd know how much better everything else is. It's as simple as that.

I can only presume that you do not work in a field where code reliability is more important than using the latest algorithms, and I have spent too much time dealing with the mess left by programmers with such an attitude.

You've really put your foot in it there. I'm actually a senior software engineer working in the security industry. Reliability is more important for us that someone in any other field. When you have your software running at a nuclear power plant and underground mines etc (and we do), you cannot afford a lack of reliability and correctness. Who's being presumptious now huh!

You completely missed out on the small range parameter - whats the point of millions of different values when there are so few possible outcomes.

In other words you don't understand the importance of a longer period until the sequence repeats?

I suggest that you build a test program. Use your algorithm and mine, test for balance over a small range.

Again, "balance" is not a property of a random number generator.

Then do a speed check. I have to call this routine millions of times per minute in one of our polygon nesting applications.

All LCRNG's are the same in terms of performance. Only difference is the size of the seed and the magic numbers. The average rand() function is the same speed. Mersenne Twister has been benchmarked against an LCRNG many times before by others and the results will obviously surprise you. benchmark Mersenne Twister - Google Search

**iMalc** · 08-13-2010

My apologies if my last post offended you. I was responding in kind. I have since calmed down and edited it.

I think I see the problem here, you've been using the word "balance" here several times and it's becomming clear that you seem to be under the impression that a desireable property of a pseudo random number generator, is the ability for it to produce each outcome an equal number of times, where once each outcome is produced, it is not produced again until all (or almost all) other outcomes have been produced. Or at least very close to that.

If so, then that is the main problem with your argument. You are confusing "balance" which is not a property of a random number generator, with "uniform distribution" which is (except when it's a different type of distribution). Uniform distribution does not imply "balance". Uniform distribution implies that in the long run there are approximately the same number of results for each outcome. Just as when flipping a coin your first five flips may turn out heads, this does not mean that the coin does not conform to a "uniform distribution". If the next 95 or so also come out heads then it almost certainly is not uniform, but if after 100 throws you have approximately a 50/50 split then it is uniform, even if say you got ten heads in a row during that time.

Put another way, a property of randomness is that one outcome is not affected by the previous outcome(s). If your PRNG exhibits this so called "balance" then that is precisely one of the properties that makes it a very poor PRNG.

Now I'm definitely not saying that this "balance" property is itself always a bad thing. Such a feature can actually be very useful depending on what you are using it for, perhaps precisely in the things you use it for. However, this does mean that as far as the definitions of a random number generator are concerned, it is extremely poor at generating random numbers.
Note that "balance" was in no way even hinted at as being a desireable behaviour for the question we were asked about. So if you feel that "I'm fixated with randomness" rather than balance, then that only because I'm sticking to what is relevant to the OP.

As for "accuracy" you'll have clarify what you had in mind when using that term.

**johnggold** · 08-13-2010

I take it that you used the routine recursively - that's how its meant to be used, rather than trying different start values, checking the result. I take it that you are familiar with recursion.

Please don't confuse random selection with balance. Your remark that you might as well count from 1 to n clearly demonstrates your lack of understanding of balance.

Further, you have to look at the application. Very few applications need perfect random selection for sampling to deliver acceptable results - after all its random selection anyway.

The routine we use is a single line procedure - very fast - works in any configuration - OS independent.

Using Mersenne twister - apart from it being large and very slow (compared to a one line recursive algorithm) you have to worry about implementation, 32 bit, 64 bit etc. Then there is the question whether the same result is repeatable across different platforms.

The commercial outcome with real life problems will be the same, so why bother. That's programming for the sake of the programmer. Ugh!!

Did you benchmark against Mersenne - if you did you'll notice quite a difference.

Try a million recursions - a daily use of this in our software.

Getting back to the original question, try a million samples with both routines of the same weighted array, check the difference against the original weighting.

Then we can talk again.

**laserlight** · 08-13-2010

Originally Posted by johnggold

I take it that you used the routine recursively - that's how its meant to be used, rather than trying different start values, checking the result.

That is quite obvious: if one were to try "different start values" each time, then it would defeat the purpose of a PRNG. This should have been obvious by iMalc's question that "you don't understand the importance of a longer period until the sequence repeats?"

Originally Posted by johnggold

Please don't confuse random selection with balance. Your remark that you might as well count from 1 to n clearly demonstrates your lack of understanding of balance.

To iMalc as well: I suggest citing the author of a quote when responding to it. The remark that you refer to was made by pianorain, not iMalc. As to your claim that people do not understand "balance": what do you understand by the term "balance"?

Originally Posted by johnggold

Getting back to the original question, try a million samples with both routines of the same weighted array, check the difference against the original weighting.

Have you done this? You have not even addressed the well known introduction of bias problem with using modulo indiscriminately that iMalc cited.

**pianorain** · 08-13-2010

iMalc is quite correct to point out the difference between what you are calling 'balance' and uniform distribution. You might take another look at that.

An easy thought experiment can be conducted to determine if this algorithm is random. Pick any seed, any at all. Then we'll play a game consisting of lots of rounds. Each round, I'll pick either heads (0) or tails (1). Then you generate one number (start = 0, count = 1) and tell me what the algorithm generated. If I guess wrong, I owe you money. If I guess right, you owe me. If this algorithm was really random, it wouldn't hemorrhage money after the first round.

A different tact: start with any seed and begin generating numbers with start = 0, count = 255. At some point you will generate 49 (and when I say 'some point', I mean 'after 0'). The next three numbers will be 142, 167, and 204. The fact that I can tell you this without knowing anything about the internals of your algorithm means that your algorithm is not random.

**johnggold** · 08-13-2010

This is all getting out of hand. I can never understand why shouting and abuse is necessary or desirable (imalc). laserlight has been highly critical of my approach without getting personal.

I do loads of testing, but as an application programmer writing for an industry running 24/4, where a mistake can bring the factory to a halt, I'm always wary.

As an application programmer of 30 years, I always consider the outcome. Apart from the necessary surrounding boring stuff, most of my development is on 2D optimisation using heuristic algorithms, and I pioneered the use of random number selection as an integral part of the algorithms used, and lecture on it.

Mersenne twister is not new. I first came across it in the 90's, and using it simply slowed my application down to non-commercial timescales, and showed no benefit in the algorithms efficiency.

This algorithm is just a part of the total processing time. PC's have got much faster, but so have other demands for more complex algorithm constraints, which increase processing time.

I'm currently working with two university research departments on using latest mathematical polygon manipulation tools as an integral part of our algorithm. I only mention this because there has been an implied suggestion that we don't look at new ideas.

These new tools may improve outcomes, whereas Mersenne Twister does not - it simply provides a different but equally usable input - slower.

**Elysia** · 08-13-2010

Apples and oranges, mate. What do you need more - speed or "randomness"? Take your pick. You can't always have everything.
Point being, if the tool you use is tailored to your use, it will improve things; otherwise it will not. You claim you don't need as much randomness as the twister gives, so instead you use a custom algorithm that is faster but produces less randomness. That's fine.
But when you start criticizing professional programmers on this board...

**johnggold** · 08-13-2010

Firstly, you will find it wasn't me that cast the first stone - I've been defending.

Second, if you can't take criticism don't dish it out. No-one including myself is above criticism.

I've been a professional programmer for 30 years, with my own software house for the last 25 years, but I would never use that to deflect criticism, as you just did.

In the end you've agreed with me - that the tool should fit the task. I was criticised for not using the sledgehammer of twister for a very small task.

Please look at my signature message.

**pianorain** · 08-13-2010

Originally Posted by johnggold

I was criticised for not using the sledgehammer of twister for a very small task.

This is incorrect. The initial criticism:

Originally Posted by laserlight

johnggold: is there any special reason to use this PRNG instead of some existing library, e.g., the TR1 random number facilities, or even just rand() from <cstdlib>?

You then proceeded to try to defend your algorithm. Your algorithm may be sufficient for whatever purposes you use it for. It is not suitable as a general-purpose PRNG, and you should not present it as such. Your algorithm may be "better over small ranges" for your definition of "better" for a specific application. It is not "better" as a general-purpose PRNG.

**iMalc** · 08-13-2010

Originally Posted by johnggold

Firstly, you will find it wasn't me that cast the first stone - I've been defending.

The problem has come about in that you appeared to be defending the randomness of the implementation you posted. It was perfectly sensible to correct you there.

Originally Posted by johnggold

This is all getting out of hand. I can never understand why shouting and abuse is necessary or desirable (imalc).

There was no shouting at any point. Capitals indicates shouting. I bolded a mere 3 words for emphasis where it was due, that's it. I shall switch to using underline for emphasis.
It would be helpful to me if you could point out what I said that you found to be the most abusive and I shall endevaour to censor myself moreso in future.

Originally Posted by johnggold

I take it that you used the routine recursively

You're probably using the word "recusively" incorrectly there. A recursive function is one that calls itself "recursively". Inductively a random number here is based on the previous, which is based on the one before that etc, but this does not entail recursion. You simply mean "repeatedly", and yes all of us here know how to seed a PRNG once and only once, prior to generating several random numbers.

It turns out that you're actually not defending the randomness of that method, you're actually defending its speed and balance. I do not doubt its speed in the slightest. Balance is not a measure of randomness, in fact given that it is at odds with independence of outcomes, it is effectively a measure of non-randomness.
I have no problem with you defending how balanced it is, but you need to acknowledge that you are therefore not defending how random it is. In effect you are virtually defending how non-random it is! (That's fine, it simply means we're both right)

Yes a Mersenne twister requires more operations than an LCRNG and hence is a little slower. Not much slower, but certainly a little. I don't think I've ever personally needed to use it because I understand how to use rand() effectively, and avoid producing bias where it matters. I would only use something better when randomness is much more important than normal. Enough said.

In case you stil aren't convinced about the non-randomness of the code you posted, the following is a technical discussion which gives you some experiments you can run which demonstrate many of the flaws in its actual randomness.

Here is another LCRNG. This is meant to be compiled on a system where an int is 32-bit:

Code:

unsigned int holdrand;
void srand(unsigned int seed) {
	holdrand = seed;
}
int rand(void) {
	return (((holdrand = holdrand * 214013L + 2531011L) >> 16) & 0x7FFF);
}

As you can see it contains the same multiplication, addition, and bitmask operations. It appears slightly obduscated because it's all written in one line, but we wont worry about that as we're all C or C++ programmers and can tell what it does regardless. I didn't write this it's taken from the web. Search for those two magic numbers and you'll find hundreds of copies of this all over the place.

Because holdrand is 32-bits and the magic numbers are carefully chosen, this generator cycles through 2^32 internal states. Because it too only generates numbers less than 0x7FFF each time, you can tell that even without reducing the range of the result, that when it produces a specific value e.g. 1234 that whatever number follows it can be different from what followed it last time. Because the LCRNG you posted only has 2^15 internal states, if it outputs 1234 followed by 567 (for example), then next time it generates 1234, it will again be followed by 567. This is why a longer sequence is more important for randomness. The more you subsequently reduce the range the more it reduces the noticeability of this pattern of course, but if you want numbers in a large range, this effect becomes highly noticeable.

Some 64-bit systems may use a LCRNG with a 64-bit internal state, this is somewhat superrior again as it allows for generating more bits of randomness using the same number of CPU operations, just with larger constants etc.

Another thing I mentioned earlier is the method of reducing the range to what the caller wants. Your method uses this technique, though in fact you've unfortunately built it into the same method:

Code:

int random(int min, int max)
{
    int range = max-min+1;
    return min + rand() % range;
}

You haven't indicated that you are familar with the rejection technique, so here is an implementation of it:

Code:

int random(int min, int max)
{
    int range = max-min+1;
    int cutoff = (RAND_MAX / range) * range;
    int val;
    do {
        val = rand();
    } while (val >= cutoff);
    return min + val % range;
}

Yes it's clearly not faster. It's not supposed to be. It simply ensures that the resulting distribution is uniform. This is for where accuracy of the distribution is more important than speed. If you want to see the difference, try generating 1,000,000 random values in the range 1 to 10000 and then produce a histogram of the results. This easily shows how vastly non-uniform the first method is.

The implementation you posted is also missing the bit shift of a slightly more modern LRCNG. It is well known that the lower bits are less random, a clear source of non-randomness in that implementation. Given it is for a 16-bit system however, this is understandable, since there aren't really enough bits to throw away the less random ones. If you take the least-significant bit of the result of your method (after stripping away the code for reducing the range, or generating numbers with a range of 0 to 32767), you will see a very predicatable pattern. The code I posted does not suffer from this anywhere near as much.

There's at least three concrete bits of evidence as to why the algorithm you posted is significantly less random than something a little bit more modern. I already knew all of this very well before you posted.
Rather than going into automatic defense mode you need to realise that of all the people out there on the net, you will eventually run into someone who does know more about a particular topic, and you should be prepared to listen and take stuff in occasionally. Of course you can expect someone else's ego to take a hit when you immediately state that someone else does not have your experience, expecially when jumping to that conclusion based on such a small initial post.

Lastly, I refer you to this page Statistical randomness - Wikipedia, the free encyclopedia which contains several types of statistical randomness tests that can very easily spot the nonrandomness in the generator you posted. Due to the method of range reduction used, your code does not even pass the basic frequency test.
In this case though, your code fails a simple visual inspection according to the knowledge I already have, and using such tests is unnecessary overkill.

**johnggold** · 08-13-2010

At last - a bit of code - I never thought I'd see it on this forum!

In most commercial applications on small scale ranges balance is more important, and if the balance is fit for purpose, then that's what matters.

Of course there are lots of times when improved randomness is beneficial such as games programming, but most don't need it, and speed dominates.

I have the same applications (still) running on MSDOS, Win3.x right up to Windows 7, 2008 server, and each version will provide identical results for the same input, which makes support a lot easier.

Up until last year, the world's largest manufacturer of machinery in our market was still using MSDOS 3 PC's on new machinery using licences bought in bulk decades ago. Others are no more advanced - using Windows NT4 workstation.

Code:

unsigned int holdrand;
void srand(unsigned int seed)
{
holdrand = seed;
}

int rand(void)
{
return (((holdrand = holdrand * 214013L + 2531011L) >> 16) & 0x7FFF);
}

The code with limiting rand() is very similar to work I did a long time ago, before I used the method we've been discussing.

However, the code above looks promising albiet 32 bit specific. I can insert that fairly easily, and I will check it's influence on outcomes on a test database.

Thread: integer arrays

Thread Tools

Search Thread

Display

Similar Threads

Access Violation

Parallel arrays, dynamic memory allocation, and sorting arrays

newbie programmer - needs help bad.

checking if a char * is an integer

size of integer arrays