Sieve of Eratosthenes

Hello

Below is a working code for the sieve .. it has the wheel 23 optimization i.,e all multiples of 2 and 3 are directly ignored (not stored at all). It generates primes upto 10^8 in around ~ 1.6s
Can any 1 suggest a few more optimizations ?
Can the bits(bit,bit1,bit2 in code) be toggled any faster ?

Thank you

Code:

#include <ctime> #include <cmath> #include <cstdio> #include <algorithm> #define MAX 100000000 using namespace std; bool isprime[MAX/3]; void make_primes2() { int i,j,lim=sqrt(MAX); bool bit1=1,bit2; for(int i=5;i<=lim;i+=bit1?2:4,bit1=!bit1) { if(!isprime[i/3]) { bit2=1; for(int j=i;;) { j+=bit2?i<<2:i<<1; bit2=!bit2; if(j>=MAX)break; isprime[j/3]=1; } } } } int main() { make_primes2(); printf("%lf(s)\n",clock()/1e6); return 0; }

I don't know about speeding up the bit toggling, but you can certainly speed up those divisions by 3 significantly, like this:

Code:

unsigned divu3(unsigned n) { unsigned r; r = (0x55555555*n + (n >> 1) - (n >> 3)) >> 30; return (n - r)*0xAAAAAAAB; }

Read all about it here.

conditions could degrade performance due to branch prediction, so I would compare

Code:

j+=bit2?i<<2:i<<1;

with expression without branch like

Code:

j+=i<<(1+bit2);

Thanks

@vart
Thanks a lot .. that was neat .

@iMalc ..
I added ur suggestion but it surprisingly it takes longer than my earlier .. any idea what wrong with my implementation ?
All the required statements are added in the code :)

Code:

#include <ctime> #include <cmath> #include <cstdio> #include <algorithm> #define MAX 100000000 #define SIZE 33333433 //#define divu3(X) ((X - ((0x55555555*X + (X >> 1) - (X >> 3)) >> 30))*0xAAAAAAAB) using namespace std; bool isprime[SIZE]; unsigned divu3(unsigned n) { unsigned r; r = (0x55555555*n + (n >> 1) - (n >> 3)) >> 30; return (n - r)*0xAAAAAAAB; } void make_primes2() { int i,j,lim=sqrt(MAX); bool bit1=1,bit2; for(i=5;i<=lim;i+=bit1?2:4,bit1=!bit1) { // if(!isprime[i/3]) if(!isprime[divu3(i)]) { bit2=1; for(j=i;;) { j+=(i<<(1+bit2)); bit2=!bit2; if(j>=MAX)break; // isprime[j/3]=1; isprime[divu3(j)]=1; } } } } int main() { make_primes2(); printf("%lf(s)\n",clock()/1e6); return 0; }

Thanks again

I didn't bother to read the source, but GCC, if you are using GCC, can get very clever with this sort of repeated operation. What is the assembler telling you?

If you are on a PC with a great `idev' you probably could see a small reduction in speed.

Soma

Hi All,

Thanks for your information

but i am unable to get how can we find whether a given number is prime or not?

Plz help me out

You might get an improvement if you declare all of your ints as unsigned, mark the function as inline, and above all else, make sure you are timing release builds set to maximum optimisation.

Examining the source then may indicate that your compiler is already doing this trick or a version of it. I'm used to VisualStudio which doesn't do it. gcc can do that kind of thing in some circumstances aparently.

Re

@iMalc:

I tried declaring them as unsigned and also used a macro .. still doesn't speed up ...
So i can only assume that the compiler is already doing such optimizations. Are there any nice resources find out and also in general more compilers ?

Thanks a lot

These are my compiler specs:

Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.4.1-4ubuntu9' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i486 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)