integer arrays

This is a discussion on integer arrays within the C++ Programming forums, part of the General Programming Boards category; is there a way to assign weights to indices in an array than use those to influence my choice? I ...

  1. #1
    Registered User
    Join Date
    Dec 2009
    Posts
    47

    integer arrays

    is there a way to assign weights to indices in an array than use those to influence my choice? I want to write a function to randomly choose indices based on these weights (this way if there are 0 entries, or entries less than other ones the choice will be influenced).

  2. #2
    Registered User
    Join Date
    Aug 2010
    Location
    England
    Posts
    86
    There are lots of ways.

    Lets say that there are 10 outcomes, with weighting as follows :

    1 5
    2 3
    ...
    10 23


    The array would contain 1, 1, 1, 1, 1, 2, 2, 2 .....

    To select a value in the range start to (start+count) count use the following function. SEED is a global value initialised to, say, 1000.

    Code:
    int randomgen( int start, int count )
    {
    /*
    This generates a pseudo random number generator which can be reproduced at
    will. This version is reasonably unlimited and is reasonably unbiased.
    */
    
    SEED = ( ( 0XA3ED * SEED ) + 0X1D31 ) & 0X7FFF;
    
    return ( abs( ( SEED % ( count - start + 1 ) + start ) ) );
    }
    With the example above, start would be 1, count would be 10.

    The return value would be the array value to select. Perfect weighting and totally pseudo random.

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    22,269
    johnggold: is there any special reason to use this PRNG instead of some existing library, e.g., the TR1 random number facilities, or even just rand() from <cstdlib>?
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User
    Join Date
    Aug 2010
    Location
    England
    Posts
    86
    Yes.

    This one is much better with small ranges. I found that Microsofts random number generator was not very well balanced.

    I use this routine for weighted sampling myself.

    As it is pseudo random, you can run a program over and over and still get the same answer every time.

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    22,269
    Quote Originally Posted by johnggold
    This one is much better with small ranges. I found that Microsofts random number generator was not very well balanced.
    That might be a good reason not to use rand(), but not necessarily a good argument against say, Mersenne Twister, which is available in the TR1 PRNG facilities.

    My concern is that unless you are citing a PRNG implementation that has been discussed in the public literature, or for which you can provide test results, tempster09 will essentially have to trust you on blind faith (or run statistical tests of randomness himself/herself). Beyond this, there is the issue that you are basically suggesting that one "reinvent the wheel" before reaching for existing library facilities.

    Quote Originally Posted by johnggold
    As it is pseudo random, you can run a program over and over and still get the same answer every time.
    But this is true of pretty much all PRNGs, so it is not a very interesting property when we are comparing them.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Registered User
    Join Date
    Aug 2010
    Location
    England
    Posts
    86
    I've been using the routine since 1983. It's not mine - it was originally in a book on BCPL - the forerunner of C, and I tested it extensively against rand() and other algorithms. I use the routine for weighted sampling over small ranges - max 200 - and over this range it is very stable and of course integer based.

  7. #7
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    My understanding is that the function is something like this:
    Code:
    int random(int weights[], int start, int end);
    so you would get a random index depending on an array of weights.

    Something like:
    Code:
    int random(vector<int>& weights)
    {
        int total = 0, found;
        int len = weights.size();
        int w[] = new int[len]
        for (int i = 0; i < len ; ++i) {
           total += weights[i];
           w[i] = total;       
        }           
        int index = rand() % total;
        for (int i = 0; i < len ; ++i)
           if (total < w[i]) {
              found = i;
              break;
           }
        delete w;
        return found;
    }
    So you would select a random index from 0 to weights.size().
    The "index" shows the index selected if you would expand the array as shown on post #2.

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,308
    Quote Originally Posted by johnggold View Post
    I've been using the routine since 1983.
    Right, so it's at least 17 years old...
    That's not surprising since it is just a very primitive LCRNG, and even at a glance it is obvious that the period until the sequence repeats (for example) is 2^15 at best. Even the rand() implementation in say VS6 (ten years old) which is also a simple LCRNG, has a period of 2^32 until it repeats. That's 262144 times longer!

    I.e what you posted sucks compared to even what most of us would consider very old. If you consider something relatively modern such as Mersenne Twister, then you may as well be comparing a snail to a formula 1 car.

    Your Pseudo Random Number Generator (PRNG) knowledge is simply woefully outdated. Whether you choose to acquire more modern knowledge in this area is up to you, but you'd be best not to pass on your current knowledge on to anyone as anything more than a history lesson on how much worse things used to be back in the day.


    tempster09:
    The way to do this is to create another array the same length as the one with your weights, and fill this with the cumulate values of the weights. I.e. for weights of:
    2, 4, 3, 2, 4
    you'd fill the cumulative array with:
    2, 6, 9, 11, 15
    Cumulative means that you add the previous total to the current weight each time and store the result.
    Next you generate a random number between 0 and the last value in the array. I.e. in this case between 0 and 15 inclusive.
    Lastly, you look for that random number in the cumulative array by performing a binary search, finding the number with the smallest index that is greater than or equal to the random number.
    E.g. if your rnadom number was 7, your binary search should result in finding the 9, at index 2.
    The answer then for that trial run, is index 2 (where the 3 was in your original array).

    Note that you can of course also do this using a linear search as C_ntua did rather than with a binary search. That's fine if you only have a small number of weights. But if you have a lot of weights, there's a standard library function called std::lower_bound that does that part for you, just use that.
    Also, make sure that your weights don't sum up to a value that overflows your data type. Since a negative weight makes no sense, you should use an unsigned type anyway, which as it happens, doubles the sum your algorithm can safely handle.
    Last edited by iMalc; 08-12-2010 at 03:00 AM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  9. #9
    Registered User
    Join Date
    Aug 2010
    Location
    England
    Posts
    86
    It's clear that you don't have my experience, because you are fixated with randomness and do not consider balance and the importance of accuracy to the application.

    When using a weighted sample balance is much more important. Balance in this case is crucial, as if there is any bias - which rand() is notorious for - you will not get a good result on small samples.

    You should always consider the well known razor blade argument, where Gillette developed at great expense blade sharpness way beyond the point where performance could actually be influenced.

    Have you tested your routine against mine for balance. Probably not.

    I assume that you have heard of the drunken man's walk problem. If not, go and find out.

    I think it is highly presumptuous to assume that because a routine is old, it is necessarily bettered by more modern algorithms, and it worries me greatly that you think that change for changes sake is a good idea.

    It is particularly presumptuous to assume that we don't check out the latest algorithms all the time.

    I can only presume that you do not work in a field where code reliability is more important than using the latest algorithms, and I have spent too much time dealing with the mess left by programmers with such an attitude.

    You completely missed out on the small range parameter - whats the point of millions of different values when there are so few possible outcomes.

    I suggest that you build a test program. Use your algorithm and mine, test for balance over a small range.

    Then do a speed check. I have to call this routine millions of times per minute in one of our polygon nesting applications.

    When you've done that - I can provide you with our test code if you don't know how - perhaps you can comment with evidence to back your assertion.

  10. #10
    Registered User
    Join Date
    Aug 2010
    Location
    England
    Posts
    86
    Just out of interest, I revisited the Mersenne Twister code - haven't looked at it for a while.

    Reminded me about its sensitivity to seeding, which I'd forgotten, and the relatively huge amount of code it requires to deliver an outcome, especially if called millions of times.

    Mersenne Twister is a favourite of games programmers - would be intereesting to see a speed performance change if they used my algorithm instead, and where actual game performance would suffer in any user identifiable way.

  11. #11
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    22,269
    Quote Originally Posted by johnggold
    You completely missed out on the small range parameter - whats the point of millions of different values when there are so few possible outcomes.
    I am a little confused here: as far as I can tell, your general idea is to create an array where each element is repeated for as many times as its weight. One of these elements is then picked at random. Consequently, this does not necessarily imply a small range since the total sum of weights may well exceed your stated maximum of 200, even if the number of elements is only 10.

    EDIT:
    Even if we change this to use the more space efficient array of cumulative weights instead, the pseudorandom number would need to be generated with respect to the cumulative weight, not the number of elements.
    Last edited by laserlight; 08-12-2010 at 04:13 AM.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  12. #12
    Registered User
    Join Date
    Aug 2010
    Location
    England
    Posts
    86
    I did say that there are lots of ways. I used that method because it's easy to easier to explain.

    The important point is to use the random number to select.

    What I actually do is populate the array with the count values for each possible outcome

    So ....

    Cell 1 has 10
    Cell 2 has 5
    Cell 3 has 11
    Cell 12 has 2

    Say the RND generates 17. This would drop you in cell 3. 10 + 5 is 15 - not enough. 10 + 5 + 11 is more than 17, so there you have it.

    If in your application the number of cells in the array is going to be very large, then you may consider a different storage mechanism.

    In my application, I use this to select a cell, then decrement its count, as I randomly select cells untill all the cells are empty.

    You haven't explained what you need the weighted sample for.

  13. #13
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    22,269
    Quote Originally Posted by johnggold
    What I actually do is populate the array with the count values for each possible outcome
    (...)
    Say the RND generates 17. This would drop you in cell 3. 10 + 5 is 15 - not enough. 10 + 5 + 11 is more than 17, so there you have it.
    Yeah, that would effectively be a variant of the use of cumulative weights, except that you will not be able to use binary search.

    Quote Originally Posted by johnggold
    If in your application the number of cells in the array is going to be very large, then you may consider a different storage mechanism.
    I think you're missing my point. Your point concerning this PRNG that your proposed is that "this one is much better with small ranges". Even if we accept this at face value, I am pointing out that where weights are concerned, "small ranges" does not necessarily apply, for your definition of small. It is not the number of elements of the array that matter, it is the sum of the weights that matter.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  14. #14
    Registered User
    Join Date
    Aug 2010
    Location
    England
    Posts
    86
    Absolutely correct. In my application, I'm typically using this method to select polygons at random for 2D nesting. The total number of polygons i.e the accumulated total weight is normally in the 200 max mark. We do go up to 1000, but this is rare.

    That's why I'm interested in the application detail - what is the weight total envisaged, and how important is the accuracy of random selection.

    With my application balance is crucial - it may not be so important in this application.

  15. #15
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,399
    Quote Originally Posted by johnggold View Post
    With my application balance is crucial - it may not be so important in this application.
    Oh yes, it's definitely balanced. In fact, while generating 256 numbers from 0 to 255 using your algorithm, it hits each number exactly once, regardless of any of the initial SEED values I've tried. While generating 512 numbers from 0 to 255, it hits each number twice. This isn't random; you might as well just increment from start to count.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Access Violation
    By Graham Aker in forum C Programming
    Replies: 100
    Last Post: 01-26-2009, 08:31 PM
  2. Replies: 16
    Last Post: 01-01-2008, 04:07 PM
  3. newbie programmer - needs help bad.
    By hortonheat in forum C Programming
    Replies: 17
    Last Post: 10-20-2004, 06:31 PM
  4. checking if a char * is an integer
    By gregulator in forum C Programming
    Replies: 5
    Last Post: 04-16-2004, 10:12 AM
  5. size of integer arrays
    By steve8820 in forum C++ Programming
    Replies: 2
    Last Post: 03-06-2002, 07:31 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21