-
Sieve of Eratosthenes
Hello
Below is a working code for the sieve .. it has the wheel 23 optimization i.,e all multiples of 2 and 3 are directly ignored (not stored at all). It generates primes upto 10^8 in around ~ 1.6s
Can any 1 suggest a few more optimizations ?
Can the bits(bit,bit1,bit2 in code) be toggled any faster ?
Thank you
Code:
#include <ctime>
#include <cmath>
#include <cstdio>
#include <algorithm>
#define MAX 100000000
using namespace std;
bool isprime[MAX/3];
void make_primes2()
{
int i,j,lim=sqrt(MAX);
bool bit1=1,bit2;
for(int i=5;i<=lim;i+=bit1?2:4,bit1=!bit1)
{
if(!isprime[i/3])
{
bit2=1;
for(int j=i;;)
{
j+=bit2?i<<2:i<<1;
bit2=!bit2;
if(j>=MAX)break;
isprime[j/3]=1;
}
}
}
}
int main()
{
make_primes2();
printf("%lf(s)\n",clock()/1e6);
return 0;
}
-
I don't know about speeding up the bit toggling, but you can certainly speed up those divisions by 3 significantly, like this:
Code:
unsigned divu3(unsigned n) {
unsigned r;
r = (0x55555555*n + (n >> 1) - (n >> 3)) >> 30;
return (n - r)*0xAAAAAAAB;
}
Read all about it here.
-
conditions could degrade performance due to branch prediction, so I would compare
with expression without branch like
-
Thanks
@vart
Thanks a lot .. that was neat .
@iMalc ..
I added ur suggestion but it surprisingly it takes longer than my earlier .. any idea what wrong with my implementation ?
All the required statements are added in the code :)
Code:
#include <ctime>
#include <cmath>
#include <cstdio>
#include <algorithm>
#define MAX 100000000
#define SIZE 33333433
//#define divu3(X) ((X - ((0x55555555*X + (X >> 1) - (X >> 3)) >> 30))*0xAAAAAAAB)
using namespace std;
bool isprime[SIZE];
unsigned divu3(unsigned n)
{
unsigned r;
r = (0x55555555*n + (n >> 1) - (n >> 3)) >> 30;
return (n - r)*0xAAAAAAAB;
}
void make_primes2()
{
int i,j,lim=sqrt(MAX);
bool bit1=1,bit2;
for(i=5;i<=lim;i+=bit1?2:4,bit1=!bit1)
{
// if(!isprime[i/3])
if(!isprime[divu3(i)])
{
bit2=1;
for(j=i;;)
{
j+=(i<<(1+bit2));
bit2=!bit2;
if(j>=MAX)break;
// isprime[j/3]=1;
isprime[divu3(j)]=1;
}
}
}
}
int main()
{
make_primes2();
printf("%lf(s)\n",clock()/1e6);
return 0;
}
Thanks again
-
I didn't bother to read the source, but GCC, if you are using GCC, can get very clever with this sort of repeated operation. What is the assembler telling you?
If you are on a PC with a great `idev' you probably could see a small reduction in speed.
Soma
-
Hi All,
Thanks for your information
but i am unable to get how can we find whether a given number is prime or not?
Plz help me out
-
You might get an improvement if you declare all of your ints as unsigned, mark the function as inline, and above all else, make sure you are timing release builds set to maximum optimisation.
Examining the source then may indicate that your compiler is already doing this trick or a version of it. I'm used to VisualStudio which doesn't do it. gcc can do that kind of thing in some circumstances aparently.
-
Re
@iMalc:
I tried declaring them as unsigned and also used a macro .. still doesn't speed up ...
So i can only assume that the compiler is already doing such optimizations. Are there any nice resources find out and also in general more compilers ?
Thanks a lot
These are my compiler specs:
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.4.1-4ubuntu9' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i486 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)