Thread: My random numbers all seem to factors of 7!

  1. #1
    Registered User
    Join Date
    May 2020
    Posts
    3

    My random numbers all seem to factors of 7!

    I've been meaning to learn C for a long time, so I started with a simple montecarlo simulation of the premium bonds. I couldn't work out how to generate random numbers bigger than RAND_MAX, so split my random numbers win/lose, and then win how much. Something seemed wrong, because I wasn't seeing the same results I got in other languages, so I added a count of the random numbers generated:

    This is code I wrote (on MacOS Mohave using CLANG to compile):

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <time.h>
    #include <string.h>
    
    int cmpfunc (const void * a, const void * b) {
       return ( *(int*)a - *(int*)b );
    }
    
    
    int main() 
    {
        int mybonds, period, numberoftrials, ernie, prize, win, i, j, k, m;
        int winchance = 24500;
        int totalprizes = 3513356;
        int sum = 0;
        static int randomnumbers[3513356];
        memset(randomnumbers, 0, sizeof(randomnumbers));
        
        srand(time(0));
        
        printf("Number of bonds? : ") ;
        scanf("%d", &mybonds) ;
        printf("Number of months? : ");
        scanf("%d", &period);
        printf("Number of trials? : ") ;
        scanf("%d", &numberoftrials) ;
        int *results = (int*) malloc(numberoftrials * sizeof(int));
    
    
        for (i = 0; i < numberoftrials; i++) {
            results[i] = 0;
            for (m = 0; m < period; m++) {
                for (j = 0; j < mybonds; j++) {
                    if (rand() > RAND_MAX / winchance) {
                        results[i] += 0;
                    } else {
                        ernie = rand() % totalprizes;
                        randomnumbers[ernie] += 1;
                        if (ernie > 63374) {
                            results[i] += 25;
                        } else if (ernie > 35815) {
                            results[i] += 50;
                        } else if (ernie > 8256) {
                            results[i] += 100;
                        } else if (ernie > 2232) {
                            results[i] += 500;
                        } else if (ernie > 224) {
                            results[i] += 1000;
                        } else if (ernie > 103) {
                            results[i] += 5000;
                        } else if (ernie > 44) {
                            results[i] += 10000;
                        } else if (ernie > 19) {
                            results[i] += 25000;
                        } else if (ernie > 7) {
                            results[i] += 50000;
                        } else if (ernie > 1) {
                            results[i] += 100000;
                        } else {
                            results[i] += 1000000;
                        }
                    }
                }
            }
        }
        qsort(results, numberoftrials, sizeof(int),cmpfunc);
        printf("\n");
        printf("Worst result: %d\n", results[0]);
        printf("Best result: %d\n", results[numberoftrials - 1] );
        printf("Median result %d\n", results[numberoftrials / 2 + 1] );
        for (k = 0; k < totalprizes; k++) {
            sum += randomnumbers[k];
            if (randomnumbers[k] > 0) {
                printf("%d : %d\n", k, randomnumbers[k]);
            }
        }
        printf("number of winners = %d\n ", sum);
    
        free(results) ;
        return 0;
    }
    And this is the output:

    Number of bonds? : 50000
    Number of months? : 12
    Number of trials? : 10000

    Worst result: 225
    Best result: 100725
    Median result 625
    7 : 3
    14 : 2
    21 : 3
    28 : 3
    175 : 3
    182 : 3
    189 : 2
    196 : 2
    203 : 3
    350 : 3
    357 : 2
    364 : 3
    371 : 3
    525 : 3
    etc, etc
    2440698 : 12245
    2457505 : 13585
    2474312 : 114779
    2491119 : 802
    number of winners = 657216

    I haven't checked every single winning number, but so far every one I've picked is a multiple of 7.

    I did try putting in a new srand(time(0)) seed before the second rand(), but this just dramatically reduced the number of random numbers generated. Where have I gone wrong?

  2. #2
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    The normal rand function is more random in the higher bits than in the lower bits.

    Your logic is testing the lower bits only!

    Tim S.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  3. #3
    Registered User
    Join Date
    May 2020
    Posts
    3
    Thanks Tim,
    It's still behaving oddly, but in a different way, so I may not have implemented your suggestion properly.
    I changed line 38 to
    ernie = (rand()>>9) % totalprizes;
    but repeated trials are again all producing the same random numbers as each other (despite seeding srand with time), although they're no longer multiples of 7.
    I also tried masking the lower order bits with ernie = (rand() & 0x1FFFE0000) % totalprizes; That produced very similar results.
    David

  4. #4
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    Code:
    ernie = (rand()>>9) % totalprizes;
    I would try
    Code:
    ernie = (rand()>>4) % totalprizes;
    Tim S.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  5. #5
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    But, you will likely have to live with bad random results.

    NOTE: There are other random functions. Maybe the others people can point to a better one.

    Tim S.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Code:
                    if (rand() > RAND_MAX / winchance) {
                        results[i] += 0;
                    } else {
                        ernie = rand() % totalprizes;
    The attempt to somehow filter the numbers is a complete waste of effort.
    The rand() you use in the else has no relation to the rand() you tried to filter in the if.

    As opposed to say
    Code:
                    int r = rand();
                    if (r > RAND_MAX / winchance) {
                        results[i] += 0;
                    } else {
                        ernie = r % totalprizes;
    What is the value of RAND_MAX on your machine?

    When I compiled your code, I got a nice even distribution anyway.
    Code:
    $ awk 'NR <= 4 || $NF > 3' tmp.txt 
    Number of bonds? : Number of months? : Number of trials? : 
    Worst result: 200
    Best result: 50850
    Median result 625
    844969 : 4
    1017408 : 4
    number of winners = 245358
    Your basic libc rand() is barely good enough for student dice or card playing homework.

    Any serious use needs better generators.
    Mersenne Twister - Wikipedia
    which is available in
    GNU Scientific Library - Wikipedia
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by stahta01
    The normal rand function is more random in the higher bits than in the lower bits.

    Your logic is testing the lower bits only!
    We're talking about rand() % 3513356 though, which means the lower 21 to 22 bits, so I must admit that I found it hard to believe that the PRNG chosen could be so bad. Therefore, I did a bit of investigation...

    Quote Originally Posted by onewire
    This is code I wrote (on MacOS Mohave using CLANG to compile):
    I happen to have the same setup on my work laptop (although we don't actually program our projects in C), and was curious so I decided to test, and basically got similiar results. I simplified your program down to this test program:
    Code:
    #include <limits.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <time.h>
    
    #define winchance 24500
    #define totalprizes 3513356
    #define THRESHOLD 127773
    
    int main(void)
    {
        srand(time(0));
        int div7_count = 0;
        int not_div7_count = 0;
        for (int i = 0; i < INT_MAX; ++i) {
            if (rand() <= THRESHOLD) {
                int n = rand() % totalprizes;
                if (n % 7 == 0) {
                    div7_count++;
                } else {
                    not_div7_count++;
                }
            }
        }
        printf("%d divisible by 7; %d otherwise\n", div7_count, not_div7_count);
        printf("%d\n", RAND_MAX / winchance);
        return 0;
    }
    The typical results that I got were reasonably in the range of:
    Code:
    127772 divisible by 7; 0 otherwise
    87652
    Increasing THRESHOLD resulted in a non-zero not_div7_count. So, it looks like for the particular rand() implementation chosen for the standard library implementation of this particular version of clang (Apple LLVM version 10.0.1 (clang-1001.0.46.4)) on this particular platform, whenever you have a pseudorandom number that is sufficiently small (less than 40000000 or so), the next pseudorandom number generated modulo 3513356 will more likely be divisible by 7, such that when the previous pseudorandom number is no greater than 127773, the next pseudorandom number modulo 3513356 will almost certainly be divisible by 7.

    What you could do is to avoid having two rand() calls per iteration, hence avoiding this weakness in the PRNG, at least to some extent. You would also make it less likely to exhaust the period of the PRNG.

    Nonetheless, I agree with stahta01: you're probably much better off just using a known PRNG for Monte Carlo rather than trying to save yourself from the problems of a poor rand() implementation. As for which PRNG to choose: I tend to be fond of the Mersenne Twister family of PRNGs because that was what I was introduced to with PHP nearly two decades ago, and its Wikipedia article can provide a starting point for more modern algorithms that you might want to try.

    Additionally, I suggest that instead of a magic number like 24500 for winchance, express winchance as a probability (either as a double, or as an integer percentage out of 100).
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User
    Join Date
    May 2020
    Posts
    3
    Thank you everyone for all the help. I can see I'm going to have to learn a little more and work out how to "borrow" someone else's random number generator.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Prime numbers and factors
    By hornet07 in forum C++ Programming
    Replies: 9
    Last Post: 11-17-2013, 04:17 PM
  2. Replies: 3
    Last Post: 02-19-2009, 10:32 PM
  3. Print Out Factors Of Numbers
    By tom24 in forum C Programming
    Replies: 6
    Last Post: 01-25-2006, 11:57 AM
  4. Replies: 4
    Last Post: 11-16-2004, 07:29 AM

Tags for this Thread