Thread: Modulo 12

  1. #1
    Registered User
    Join Date
    Jan 2009
    Posts
    71

    Modulo 12

    Hi

    Is there any way to do the modulo 12 operation (i.e, n%12 ) in a faster way (other than the obvious n%12 way ?
    I am implementing sieve of atkin so need to speed it up

    Thanks a lot

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Review the generated assembler, determine its inefficiencies, and code your own optimized inline assembler.
    Mainframe assembler programmer by trade. C coder when I can.

  3. #3
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Quote Originally Posted by Dino View Post
    Review the generated assembler, determine its inefficiencies, and code your own optimized inline assembler.
    I seriously doubt you can make anything faster than what is probably a single instruction to the CPU. Especially if you compile with full optimization, you're not going to get anything faster.

    In fact, I think C++ is generally faster than assembly language, as the compiler can better understand a lot of optimization techniques that are practically impossible for humans to apply correctly: for instance mixing up certain instructions so that memory read/writes are further apart - but how many instructions should be in between to be able to be as fast as possible? I have no idea on that. In fact, I think it even depends on the number of clock cycles it takes for the instructions between the memory accessing.

  4. #4
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Quote Originally Posted by EVOEx View Post
    I seriously doubt you can make anything faster than what is probably a single instruction to the CPU. Especially if you compile with full optimization, you're not going to get anything faster.

    In fact, I think C++ is generally faster than assembly language, as the compiler can better understand a lot of optimization techniques that are practically impossible for humans to apply correctly: for instance mixing up certain instructions so that memory read/writes are further apart - but how many instructions should be in between to be able to be as fast as possible? I have no idea on that. In fact, I think it even depends on the number of clock cycles it takes for the instructions between the memory accessing.
    I know that, and you know that, but the OP needs to figure this out.
    Mainframe assembler programmer by trade. C coder when I can.

  5. #5
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    It doesn't make sense to want to speed up a modulo 12 operation. That's not going to be the bottleneck. One should always concentrate on areas that are truly slow. Such as any implementation for finding primes will have complexities - even decision making along the way whose execution time will easily exceed any modulo.

  6. #6
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  7. #7
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Quote Originally Posted by iMalc View Post
    Right. I tried the modulo of the second link. Here are two versions I build to calculate modulo 12...

    test_normal.cpp:
    Code:
    #include <iostream>
    #include <cmath>
    
    int main()
    {
    	double total = 0;
    	for(int i = 0; i < 100000000; ++i) {
    		if(i % 7 == 0)
    			total += i;
    	}
    	std::cout << total << std::endl;
    }

    test_hack.cpp:
    Code:
    #include <iostream>
    #include <cmath>
    
    int main()
    {
    	double total = 0;
    	float flr = floor(pow(2, 32)/12);
    	float mlt = 1.0/12.0;
    	for(int i = 0; i < 100000000; ++i) {
    		if(i * mlt <= flr)
    			total += i;
    	}
    	std::cout << total << std::endl;
    }
    Excuse me the horrible code. Anyways, first of all, they both output something different, so there seems to be something wrong with the hack-method (or I screwed it up). And even if there wasn't, the two ran at approximately the same speed (the hacked one even slower, but negligeable).
    So, no, I don't think it's worth it to add such hacks.

    You must know all those bit-hacks that are listed on some page? Well, they used to be good, but if you look at it on current processors, you'll notice that the normal versions are just as fast nowadays, and often enough even faster. So don't use them, unless you actually try it and make sure it's faster.

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    No the problem is that you've misinterpreted the code. There is not supposed to be any floats whatsoever in the program, just a few precalculated integer constants.
    Admittedly, it is totally unclear what inverse(7) means in that code (it's just some other integer constant).
    I should have time to try it out myself later.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Okay, I worked out what inverse(7) is for this example. It is 2454267027 which comes from:
    Code:
    ceil((((1<<32) % 7)<<32) / 7)
    613566756 is
    Code:
    floor((1<<32) / 7)
    Try the following code:
    Code:
    #include <iostream>
    #include <cmath>
    
    int main()
    {
    	double total = 0;
    	for (unsigned int i = 0; i < 100000000; ++i) {
    		if (i * 2454267027 <= 613566756)
    			total += i;
    	}
    	std::cout << total << std::endl;
    }
    That should give the same answer and be faster than the i%7 version. Note that a great deal of the time will probably be spent on the total += i; line because that requires an int to double conversion. If you really want to see how much quicker it is, you'll need to change i and total to unsigned int in both versions.

    Note that for this hack for i%7, it only works up to 858993461 (0x33333335)
    It can be tested basically like this:
    Code:
    for (unsigned int i=0; i<=858993461; ++i) {
        bool t1 = (i % 7) == 0;
        bool t2 = i * 2454267027 <= 613566756;
        assert(t1 == t2);
    }
    Last edited by iMalc; 04-09-2010 at 04:13 PM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Appliying this technique to the original mod 12 problem. The following is one solution to if (n % 12 == 0):
    Code:
    if (!(n & 3) && n * 1431655766 <= 1431655765)
    The extra !(n & 3) test is because this technique only works on odd numbers, so it's actually doing a test for n%3 in that multiplication, not n%12
    Note that it works only up to 2147483647 (0x7FFFFFFF), and it's up to you to test the performance difference.

    I just realised that you actually need the result of the modulus rather than just testing it against zero, so I still recommend going through both of the above articles, and working it out from that.
    Last edited by iMalc; 04-09-2010 at 04:42 PM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #11
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    @iMalc:
    The results still differ for me (I made it total++ and made total an integer). But, granted, the normal version was 50% slower than the hack.

  12. #12
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by EVOEx View Post
    @iMalc:
    The results still differ for me
    Impossible! My units tests prove conclusively that they give the same answers all the way up the the limits I stated:
    Code:
    TESTMETHOD(TestMod7)()
    {
    	for (unsigned int i=0; i<0x33333336; ++i) {
    		bool t1 = (i % 7) == 0;
    		bool t2 = i * 2454267027U <= 613566756U;
    		ASSERT(t1 == t2);
    	}
    }
    
    TESTMETHOD(TestMod12)()
    {
    	for (unsigned int i=0; i<0x80000000; ++i) {
    		bool t1 = (i % 12) == 0;
    		bool t2 = !(i & 3) && i * 1431655766U <= 1431655765U;
    		ASSERT(t1 == t2);
    	}
    }
    I've also checked, that the U makes no difference, and there's no undefined behaviour here either, since unsigned overflow behaviour is defined.
    Works on VS2008 and MinGW 5.1.6.
    If the above fails on your compiler then I'd be inclined to start digging into assembly and figuring out what is wrong with the compiler, though perhaps you simply copied it down wrong.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  13. #13
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Ah, of course, I didn't have i unsigned. Nor the integer constants. When I changed that to unsigned it did work properly.
    35% speed increase as well. Nice find ;-).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Nearly finished, but a single problem remains.
    By jaja009 in forum C Programming
    Replies: 4
    Last Post: 03-26-2010, 04:41 PM
  2. Mesmerizing Array problem
    By jaja009 in forum C Programming
    Replies: 6
    Last Post: 03-26-2010, 03:11 AM
  3. Replies: 9
    Last Post: 03-13-2010, 03:40 AM
  4. Is this me or Windows?
    By carrotcake1029 in forum C Programming
    Replies: 9
    Last Post: 05-07-2008, 09:18 AM
  5. Can't figure out why?
    By kwigibo in forum C Programming
    Replies: 10
    Last Post: 10-14-2001, 10:58 PM