Hi
Is there any way to do the modulo 12 operation (i.e, n%12 ) in a faster way (other than the obvious n%12 way ?
I am implementing sieve of atkin so need to speed it up
Thanks a lot
Hi
Is there any way to do the modulo 12 operation (i.e, n%12 ) in a faster way (other than the obvious n%12 way ?
I am implementing sieve of atkin so need to speed it up
Thanks a lot
Review the generated assembler, determine its inefficiencies, and code your own optimized inline assembler.
Mainframe assembler programmer by trade. C coder when I can.
I seriously doubt you can make anything faster than what is probably a single instruction to the CPU. Especially if you compile with full optimization, you're not going to get anything faster.
In fact, I think C++ is generally faster than assembly language, as the compiler can better understand a lot of optimization techniques that are practically impossible for humans to apply correctly: for instance mixing up certain instructions so that memory read/writes are further apart - but how many instructions should be in between to be able to be as fast as possible? I have no idea on that. In fact, I think it even depends on the number of clock cycles it takes for the instructions between the memory accessing.
It doesn't make sense to want to speed up a modulo 12 operation. That's not going to be the bottleneck. One should always concentrate on areas that are truly slow. Such as any implementation for finding primes will have complexities - even decision making along the way whose execution time will easily exceed any modulo.
Some very useful links for this:
http://www.hackersdelight.org/divcMore.pdf
Bit Tricks, Part I: Exact Division
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"
Right. I tried the modulo of the second link. Here are two versions I build to calculate modulo 12...
test_normal.cpp:
Code:#include <iostream> #include <cmath> int main() { double total = 0; for(int i = 0; i < 100000000; ++i) { if(i % 7 == 0) total += i; } std::cout << total << std::endl; }
test_hack.cpp:
Excuse me the horrible code. Anyways, first of all, they both output something different, so there seems to be something wrong with the hack-method (or I screwed it up). And even if there wasn't, the two ran at approximately the same speed (the hacked one even slower, but negligeable).Code:#include <iostream> #include <cmath> int main() { double total = 0; float flr = floor(pow(2, 32)/12); float mlt = 1.0/12.0; for(int i = 0; i < 100000000; ++i) { if(i * mlt <= flr) total += i; } std::cout << total << std::endl; }
So, no, I don't think it's worth it to add such hacks.
You must know all those bit-hacks that are listed on some page? Well, they used to be good, but if you look at it on current processors, you'll notice that the normal versions are just as fast nowadays, and often enough even faster. So don't use them, unless you actually try it and make sure it's faster.
No the problem is that you've misinterpreted the code. There is not supposed to be any floats whatsoever in the program, just a few precalculated integer constants.
Admittedly, it is totally unclear what inverse(7) means in that code (it's just some other integer constant).
I should have time to try it out myself later.
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"
Okay, I worked out what inverse(7) is for this example. It is 2454267027 which comes from:613566756 isCode:ceil((((1<<32) % 7)<<32) / 7)Try the following code:Code:floor((1<<32) / 7)That should give the same answer and be faster than the i%7 version. Note that a great deal of the time will probably be spent on the total += i; line because that requires an int to double conversion. If you really want to see how much quicker it is, you'll need to change i and total to unsigned int in both versions.Code:#include <iostream> #include <cmath> int main() { double total = 0; for (unsigned int i = 0; i < 100000000; ++i) { if (i * 2454267027 <= 613566756) total += i; } std::cout << total << std::endl; }
Note that for this hack for i%7, it only works up to 858993461 (0x33333335)
It can be tested basically like this:Code:for (unsigned int i=0; i<=858993461; ++i) { bool t1 = (i % 7) == 0; bool t2 = i * 2454267027 <= 613566756; assert(t1 == t2); }
Last edited by iMalc; 04-09-2010 at 04:13 PM.
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"
Appliying this technique to the original mod 12 problem. The following is one solution to if (n % 12 == 0):The extra !(n & 3) test is because this technique only works on odd numbers, so it's actually doing a test for n%3 in that multiplication, not n%12Code:if (!(n & 3) && n * 1431655766 <= 1431655765)
Note that it works only up to 2147483647 (0x7FFFFFFF), and it's up to you to test the performance difference.
I just realised that you actually need the result of the modulus rather than just testing it against zero, so I still recommend going through both of the above articles, and working it out from that.
Last edited by iMalc; 04-09-2010 at 04:42 PM.
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"
@iMalc:
The results still differ for me (I made it total++ and made total an integer). But, granted, the normal version was 50% slower than the hack.
Impossible! My units tests prove conclusively that they give the same answers all the way up the the limits I stated:I've also checked, that the U makes no difference, and there's no undefined behaviour here either, since unsigned overflow behaviour is defined.Code:TESTMETHOD(TestMod7)() { for (unsigned int i=0; i<0x33333336; ++i) { bool t1 = (i % 7) == 0; bool t2 = i * 2454267027U <= 613566756U; ASSERT(t1 == t2); } } TESTMETHOD(TestMod12)() { for (unsigned int i=0; i<0x80000000; ++i) { bool t1 = (i % 12) == 0; bool t2 = !(i & 3) && i * 1431655766U <= 1431655765U; ASSERT(t1 == t2); } }
Works on VS2008 and MinGW 5.1.6.
If the above fails on your compiler then I'd be inclined to start digging into assembly and figuring out what is wrong with the compiler, though perhaps you simply copied it down wrong.
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"
Ah, of course, I didn't have i unsigned. Nor the integer constants. When I changed that to unsigned it did work properly.
35% speed increase as well. Nice find ;-).