Perhaps start with a simplified version which actually works everywhere, rather than a buggy one which only works in one place by accident.

Printable View

- 07-31-2007Salem
Perhaps start with a simplified version which actually works everywhere, rather than a buggy one which only works in one place by accident.

- 07-31-2007anon
I get a warning that square might be used uninitialized and that seems to be the problem indeed.

However, it seems strange that you are outputting while the timing. Printing to console is a rather slow and it would make more sense to time how long it takes to generate primes up to [various sizes] and only then may-be validate that they are actually right, for example against a known list. - 07-31-2007Differ
- 07-31-2007Differ
It turns out that the code of actually determining the primes is virtually instant. It's the displaying that's the problem. =(

Is there a way I could multi-thread this? - 07-31-2007Salem
Writing the results to a file will be much quicker than writing to cout.

Also, post your latest bug-fixed code. - 07-31-2007Differ
- 07-31-2007matsp
Writing to console is faster if:

1. You write many numbers to one line (so not a newline for every number, perhpaps you can make nice 16-space columns -> 5 numbers per row). Better yet, check out how many digits you need, add a space or two per line and print more lines for the few short ones, then longer lines as needed. Try to break lines between numbers, rather than in the middle of the number, tho'.

2. The console is in "full-screen VGA mode". That means, if you are writing to a shell/command window, the graphics is being scrolled as pixels (albeit a few pixels at a time, say 16 or so), but in VGA mode, the text is actually bytes, so you get much more data movement from a few bytes.

--

Mats - 07-31-2007matsp
If we ignore (i.e. remove) the printout, your code is slightly slower than my "obvious" solution of just storing a prime and trying to divide with them:

Code:`void doPrimes(void)`

{

double duration;

clock_t start, finish;

int i;

int found;

int n;

int *p;

int *limit;

if ((primes = (int *)malloc(maxSize * sizeof(int))) == NULL)

{

fprintf(stderr, "Error allocating memory (%d)\n", maxSize * sizeof(int));

exit(1);

}

start = clock();

primes[0] = 2; /* Start with a few known prime numbers */

n = 1;

limit = primes;

for(i = 3; n < maxSize; i += 2) // We know that even numbers are NOT primes!

{

found = 1;

for (p = primes; p != limit; p++)

{

if ((i % *p) == 0)

{

found = 0;

break;

}

}

if (i > *limit * *limit)

limit++;

if (found)

{

primes[n++] = i;

}

}

finish = clock();

duration = (double)(finish - start) / CLOCKS_PER_SEC;

printf("Time to find %d primes is %2.1f seconds\n", n, duration);

printf("Largest prime = %d Limit = %d\n", primes[n-1], *limit);

}

Your code compiles fine on MS VC7 with the small change of moving the "using namespace std" down to after the iostream include line.

My code takes 35.3 seconds to do "2M" of primes, reaching 34122199. Your code, on the same machine, using the same compiler, takes 37.6 seconds.

I haven't done anything to analyze where you're loosing time. I suspect it's your "cleverness" in doing "x *3 / 2" to index arrays - that doesn't look too good for a compiler to do quickly.

I'm also quite sure that you would be better off using the suggested statement of:

smn = (smn == 4) ?2:4;

instead of

smn = 2 + (smn == 2) * 2

Or perhaps "smn ^= 6;"?

--

Mats - 07-31-2007Differ
Rules are one prime per line, 20 seconds to generate primes. It's mainly the displaying that's causing problems. The generating is minimal.

I didn't know how smn = (smn == 4) ?2:4; worked, so I didn't use it.

smn ^= 6; sounds like an excellent idea. It has now replaced smn = (smn == 2) * 2 + 2;.

As for the * 3 / 2, I can't think of a better way to multiply by 1.5.

TEST RESULTS:

Clocking it to 10 mil without cout takes 6 secs.

Clocking it to 10 mil with cout takes 67 secs.

Therefore, the cout operation takes 61 secs. =(

I noticed that right when it hits 1 mil it slows down noticeably. - 07-31-2007CodeMonkeyCode:
`smn = (smn == 4) ?2:4`

- 08-01-2007Salem
> It's mainly the displaying that's causing problems. The generating is minimal.

Yes, I/O can be very expensive in real-time use, but not in terms of processing time. All that waiting around for some slow device to catch up hurts.

Without any messy I/O, I set the limit to 10,000,000 in your code and mine, and got these timings (the slow one is yours).

Code:`$ g++ -O2 -o yin.exe prime_yin.cpp`

$ g++ -O2 foo.cpp

$ time ./a.exe

real 0m3.304s

user 0m3.024s

sys 0m0.040s

$ time ./yin.exe

real 0m11.016s

user 0m10.645s

sys 0m0.030s

$ time ./yin.exe

real 0m11.046s

user 0m10.645s

sys 0m0.030s

$ time ./a.exe

real 0m3.275s

user 0m3.014s

sys 0m0.050s

Code:`#include <iostream>`

#include <cmath>

using namespace std;

int nextPrime ( bool *flags, int currentPrime ) {

int nextPrime = currentPrime + 1;

while ( !flags[nextPrime] ) {

nextPrime++;

}

return nextPrime;

}

int main ( ) {

int limit;

//!!cout << "How many" << endl;

limit = 10000000; //!!cin >> limit;

// A set of flags, indicating the initial guess that all are prime

bool *flags = new bool[limit];

for ( int i = 0 ; i < limit ; i++ ) {

flags[i] = true;

}

// Sieve them

int check = int( sqrt(limit) ) + 1;

for ( int c = 2 ; c <= check ; c = nextPrime(flags,c) ) {

// Now eliminate all the multiples of c

for ( int s = c * 2 ; s < limit ; s += c ) {

// Not prime after all

flags[s] = false;

}

}

// Print out those which really are prime

for ( int r = 2 ; r < limit ; r++ ) {

if ( flags[r] ) {

//!!cout << r << endl;

}

}

delete [] flags;

return 0;

}

- 08-01-2007matsp
Salem,

You need a newer computer - or gcc is doing a poor job of this. My Athlon64 3200+ (Socket 754) does the same in 1 second, and it's not exactly the "top of the range" at this time... ;-) [Yes, that's a JOKE]

I also have a "bitsieve" implementation, which uses a bitmap as the array, rather than a byte per entry. It is a little bit quicker for larger primes.

But sieve is not good if you need to find "more primes than your memory can hold".

As to the divide by 1.5 - I'd expect it to be faster if you use just a plain index, not trying to save memory (you can do one of the two, generically: save memory or save CPU-time - but not both).

Yes, at around 1M primes you've probably reached the limit of your L2 cache, which means that data has to be passed back and forth between main memory, so slower than using the cache. Unfortunately, there's not much you can do about that.

--

Mats - 08-01-2007Salem
> My Athlon64 3200+ (Socket 754) does the same in 1 second,

So my 1.5Ghz laptop with <256M of memory is hot stuff by comparison ;)

Plus running inside cygwin won't help at all where there is any interaction with the environment.

Yes, it's time to go shopping for a new machine or two.

Anyway, it's the comparison not the absolute values which matters. - 08-01-2007matsp
I did point out that it was a joke! ;-)

Quote:

Yes, it's time to go shopping for a new machine or two.

Anyway, it's the comparison not the absolute values which matters.

With same parameters in my tests (as I don't like tests that take 1s or less, I increased it a bit to 10000007):

Yin: 168.0s

Plain Sieve (almost the same as Salem's code): 11.7s

Bit Sieve (using one bit per flag, instead of a byte): 3.8s

--

Mats - 08-01-2007anon
If you are supposed to print the primes within the 20 seconds, then the competition is slightly pointless: I/O is the main bottle-neck for which micro-optimisations such as using arrays of bools (are you sure it even has a positive impact) don't help much. All in all it comes to how fast the display scrolls which might eventually make any implementation practically equal.

I think I suggested it before, but you'll get a more meaningful comparison if you change the rules, remove output requirement and time how long it takes to find N primes. You may only then be required to output them somehow (probably to file) to be able to see that the algorithm is correct in the first place.