Thread: Various optimization questions

  1. #31
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Perhaps start with a simplified version which actually works everywhere, rather than a buggy one which only works in one place by accident.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  2. #32
    The larch
    Join Date
    May 2006
    Posts
    3,573
    I get a warning that square might be used uninitialized and that seems to be the problem indeed.

    However, it seems strange that you are outputting while the timing. Printing to console is a rather slow and it would make more sense to time how long it takes to generate primes up to [various sizes] and only then may-be validate that they are actually right, for example against a known list.
    Last edited by anon; 07-31-2007 at 03:18 PM.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  3. #33
    </3 Segfaults
    Join Date
    Jul 2007
    Posts
    27
    Quote Originally Posted by anon View Post
    I get a warning that square might be used uninitialized and that seems to be the problem indeed.
    Yes it is. I didn't notice that when I updated my code. Thanks for the heads-up.

  4. #34
    </3 Segfaults
    Join Date
    Jul 2007
    Posts
    27
    It turns out that the code of actually determining the primes is virtually instant. It's the displaying that's the problem. =(

    Is there a way I could multi-thread this?

  5. #35
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Writing the results to a file will be much quicker than writing to cout.
    Also, post your latest bug-fixed code.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #36
    </3 Segfaults
    Join Date
    Jul 2007
    Posts
    27
    Quote Originally Posted by Salem View Post
    Writing the results to a file will be much quicker than writing to cout.
    Also, post your latest bug-fixed code.
    It's at page 2. I'm not quite sure if we're allowed to write it to file; let me check.
    Last edited by Differ; 07-31-2007 at 04:21 PM. Reason: gramur nazee

  7. #37
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Writing to console is faster if:
    1. You write many numbers to one line (so not a newline for every number, perhpaps you can make nice 16-space columns -> 5 numbers per row). Better yet, check out how many digits you need, add a space or two per line and print more lines for the few short ones, then longer lines as needed. Try to break lines between numbers, rather than in the middle of the number, tho'.

    2. The console is in "full-screen VGA mode". That means, if you are writing to a shell/command window, the graphics is being scrolled as pixels (albeit a few pixels at a time, say 16 or so), but in VGA mode, the text is actually bytes, so you get much more data movement from a few bytes.

    --
    Mats

  8. #38
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    If we ignore (i.e. remove) the printout, your code is slightly slower than my "obvious" solution of just storing a prime and trying to divide with them:

    Code:
    void doPrimes(void)
    {
      double duration;
      
      clock_t start, finish;
      
      int i;
      int found;
      int n;
      int *p;
      int *limit;
    
      if ((primes = (int *)malloc(maxSize * sizeof(int))) == NULL)
        {
          fprintf(stderr, "Error allocating memory (%d)\n", maxSize * sizeof(int));
          exit(1);
        }
    
      start = clock();
      primes[0] = 2;  /* Start with a few known prime numbers */
      n = 1;
      limit = primes;
      for(i = 3; n < maxSize; i += 2)   // We know that even numbers are NOT primes!
        {   
          found = 1;
          for (p = primes; p != limit; p++)
            {
    	  if ((i % *p) == 0)
                {
    	      found = 0;
    	      break;
                }
            }
          if (i > *limit * *limit)
    	limit++;
          if (found)
            {
    	  primes[n++] = i;
            }
        }
      finish = clock();
      duration = (double)(finish - start) / CLOCKS_PER_SEC;
      printf("Time to find %d primes is %2.1f seconds\n", n, duration);
      printf("Largest prime = %d  Limit = %d\n", primes[n-1], *limit);
    }
    My code was written some time ago, and it doesn't take a "max prime" number, but rather how many primes to find - so I run mine first with some arbitrarily large amount of memory (2M), then use the reported largest prime and use it in your code.

    Your code compiles fine on MS VC7 with the small change of moving the "using namespace std" down to after the iostream include line.

    My code takes 35.3 seconds to do "2M" of primes, reaching 34122199. Your code, on the same machine, using the same compiler, takes 37.6 seconds.

    I haven't done anything to analyze where you're loosing time. I suspect it's your "cleverness" in doing "x *3 / 2" to index arrays - that doesn't look too good for a compiler to do quickly.

    I'm also quite sure that you would be better off using the suggested statement of:
    smn = (smn == 4) ?2:4;
    instead of
    smn = 2 + (smn == 2) * 2

    Or perhaps "smn ^= 6;"?

    --
    Mats

  9. #39
    </3 Segfaults
    Join Date
    Jul 2007
    Posts
    27
    Rules are one prime per line, 20 seconds to generate primes. It's mainly the displaying that's causing problems. The generating is minimal.

    I didn't know how smn = (smn == 4) ?2:4; worked, so I didn't use it.

    smn ^= 6; sounds like an excellent idea. It has now replaced smn = (smn == 2) * 2 + 2;.

    As for the * 3 / 2, I can't think of a better way to multiply by 1.5.

    TEST RESULTS:

    Clocking it to 10 mil without cout takes 6 secs.
    Clocking it to 10 mil with cout takes 67 secs.
    Therefore, the cout operation takes 61 secs. =(

    I noticed that right when it hits 1 mil it slows down noticeably.
    Last edited by Differ; 07-31-2007 at 05:29 PM.

  10. #40
    Kiss the monkey. CodeMonkey's Avatar
    Join Date
    Sep 2001
    Posts
    937
    Code:
    smn = (smn == 4) ?2:4
    set smn to "smn equals 4 is true? then 2. Is not true? then 4."
    "If you tell the truth, you don't have to remember anything"
    -Mark Twain

  11. #41
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > It's mainly the displaying that's causing problems. The generating is minimal.
    Yes, I/O can be very expensive in real-time use, but not in terms of processing time. All that waiting around for some slow device to catch up hurts.

    Without any messy I/O, I set the limit to 10,000,000 in your code and mine, and got these timings (the slow one is yours).
    Code:
    $ g++ -O2 -o yin.exe prime_yin.cpp
    $ g++ -O2 foo.cpp
    $ time ./a.exe
    
    real    0m3.304s
    user    0m3.024s
    sys     0m0.040s
    $ time ./yin.exe
    
    real    0m11.016s
    user    0m10.645s
    sys     0m0.030s
    $ time ./yin.exe
    
    real    0m11.046s
    user    0m10.645s
    sys     0m0.030s
    $ time ./a.exe
    
    real    0m3.275s
    user    0m3.014s
    sys     0m0.050s
    And this is the code I used
    Code:
    #include <iostream>
    #include <cmath>
    using namespace std;
    
    int nextPrime ( bool *flags, int currentPrime ) {
        int nextPrime = currentPrime + 1;
        while ( !flags[nextPrime] ) {
            nextPrime++;
        }
        return nextPrime;
    }
    
    int main ( ) {
        int limit;
        //!!cout << "How many" << endl;
        limit = 10000000; //!!cin >> limit;
    
        // A set of flags, indicating the initial guess that all are prime
        bool *flags = new bool[limit];
        for ( int i = 0 ; i < limit ; i++ ) {
            flags[i] = true;
        }
    
        // Sieve them
        int check = int( sqrt(limit) ) + 1;
        for ( int c = 2 ; c <= check ; c = nextPrime(flags,c) ) {
            // Now eliminate all the multiples of c
            for ( int s = c * 2 ; s < limit ; s += c ) {
                // Not prime after all
                flags[s] = false;
            }
        }
    
        // Print out those which really are prime
        for ( int r = 2 ; r < limit ; r++ ) {
            if ( flags[r] ) {
                //!!cout << r << endl;
            }
        }
    
        delete [] flags;
    
        return 0;
    }
    Which is a rather literal interpretation of http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #42
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Salem,

    You need a newer computer - or gcc is doing a poor job of this. My Athlon64 3200+ (Socket 754) does the same in 1 second, and it's not exactly the "top of the range" at this time... ;-) [Yes, that's a JOKE]

    I also have a "bitsieve" implementation, which uses a bitmap as the array, rather than a byte per entry. It is a little bit quicker for larger primes.

    But sieve is not good if you need to find "more primes than your memory can hold".

    As to the divide by 1.5 - I'd expect it to be faster if you use just a plain index, not trying to save memory (you can do one of the two, generically: save memory or save CPU-time - but not both).

    Yes, at around 1M primes you've probably reached the limit of your L2 cache, which means that data has to be passed back and forth between main memory, so slower than using the cache. Unfortunately, there's not much you can do about that.

    --
    Mats

  13. #43
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > My Athlon64 3200+ (Socket 754) does the same in 1 second,
    So my 1.5Ghz laptop with <256M of memory is hot stuff by comparison
    Plus running inside cygwin won't help at all where there is any interaction with the environment.

    Yes, it's time to go shopping for a new machine or two.

    Anyway, it's the comparison not the absolute values which matters.
    Last edited by Salem; 08-01-2007 at 03:44 AM. Reason: laptop has variable clock
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  14. #44
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Salem View Post
    > My Athlon64 3200+ (Socket 754) does the same in 1 second,
    So my 1.5Ghz laptop with <256M of memory is hot stuff by comparison
    Plus running inside cygwin won't help at all where there is any interaction with the environment.
    I did point out that it was a joke! ;-)
    Yes, it's time to go shopping for a new machine or two.

    Anyway, it's the comparison not the absolute values which matters.
    Yes, completely agree.
    With same parameters in my tests (as I don't like tests that take 1s or less, I increased it a bit to 10000007):
    Yin: 168.0s
    Plain Sieve (almost the same as Salem's code): 11.7s
    Bit Sieve (using one bit per flag, instead of a byte): 3.8s

    --
    Mats
    Last edited by matsp; 08-01-2007 at 04:02 AM. Reason: Fix quotes

  15. #45
    The larch
    Join Date
    May 2006
    Posts
    3,573
    If you are supposed to print the primes within the 20 seconds, then the competition is slightly pointless: I/O is the main bottle-neck for which micro-optimisations such as using arrays of bools (are you sure it even has a positive impact) don't help much. All in all it comes to how fast the display scrolls which might eventually make any implementation practically equal.

    I think I suggested it before, but you'll get a more meaningful comparison if you change the rules, remove output requirement and time how long it takes to find N primes. You may only then be required to output them somehow (probably to file) to be able to see that the algorithm is correct in the first place.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. questions....so many questions about random numbers....
    By face_master in forum C++ Programming
    Replies: 2
    Last Post: 07-30-2009, 08:47 AM
  2. Turn Off Optimization?
    By danlee58 in forum C Programming
    Replies: 6
    Last Post: 12-10-2008, 03:52 AM
  3. A very long list of questions... maybe to long...
    By Ravens'sWrath in forum C Programming
    Replies: 16
    Last Post: 05-16-2007, 05:36 AM
  4. Trivial questions - what to do?
    By Aerie in forum A Brief History of Cprogramming.com
    Replies: 23
    Last Post: 12-26-2004, 09:44 AM
  5. questions questions questions.....
    By mfc2themax in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 08-14-2001, 07:22 AM