Modulo 12

**jack_carver** · 04-08-2010

Hi

Is there any way to do the modulo 12 operation (i.e, n%12 ) in a faster way (other than the obvious n%12 way ?
I am implementing sieve of atkin so need to speed it up

Thanks a lot

**Dino** · 04-08-2010

Review the generated assembler, determine its inefficiencies, and code your own optimized inline assembler.

**EVOEx** · 04-08-2010

Originally Posted by Dino

Review the generated assembler, determine its inefficiencies, and code your own optimized inline assembler.

I seriously doubt you can make anything faster than what is probably a single instruction to the CPU. Especially if you compile with full optimization, you're not going to get anything faster.

In fact, I think C++ is generally faster than assembly language, as the compiler can better understand a lot of optimization techniques that are practically impossible for humans to apply correctly: for instance mixing up certain instructions so that memory read/writes are further apart - but how many instructions should be in between to be able to be as fast as possible? I have no idea on that. In fact, I think it even depends on the number of clock cycles it takes for the instructions between the memory accessing.

**Dino** · 04-08-2010

Originally Posted by EVOEx

I seriously doubt you can make anything faster than what is probably a single instruction to the CPU. Especially if you compile with full optimization, you're not going to get anything faster.

In fact, I think C++ is generally faster than assembly language, as the compiler can better understand a lot of optimization techniques that are practically impossible for humans to apply correctly: for instance mixing up certain instructions so that memory read/writes are further apart - but how many instructions should be in between to be able to be as fast as possible? I have no idea on that. In fact, I think it even depends on the number of clock cycles it takes for the instructions between the memory accessing.

I know that, and you know that, but the OP needs to figure this out.

**nonoob** · 04-08-2010

It doesn't make sense to want to speed up a modulo 12 operation. That's not going to be the bottleneck. One should always concentrate on areas that are truly slow. Such as any implementation for finding primes will have complexities - even decision making along the way whose execution time will easily exceed any modulo.

**iMalc** · 04-09-2010

Some very useful links for this:
http://www.hackersdelight.org/divcMore.pdf
Bit Tricks, Part I: Exact Division

**EVOEx** · 04-09-2010

Originally Posted by iMalc

Some very useful links for this:
http://www.hackersdelight.org/divcMore.pdf
Bit Tricks, Part I: Exact Division

Right. I tried the modulo of the second link. Here are two versions I build to calculate modulo 12...

test_normal.cpp:

Code:

#include <iostream>
#include <cmath>

int main()
{
	double total = 0;
	for(int i = 0; i < 100000000; ++i) {
		if(i % 7 == 0)
			total += i;
	}
	std::cout << total << std::endl;
}

test_hack.cpp:

Code:

#include <iostream>
#include <cmath>

int main()
{
	double total = 0;
	float flr = floor(pow(2, 32)/12);
	float mlt = 1.0/12.0;
	for(int i = 0; i < 100000000; ++i) {
		if(i * mlt <= flr)
			total += i;
	}
	std::cout << total << std::endl;
}

Excuse me the horrible code. Anyways, first of all, they both output something different, so there seems to be something wrong with the hack-method (or I screwed it up). And even if there wasn't, the two ran at approximately the same speed (the hacked one even slower, but negligeable).
So, no, I don't think it's worth it to add such hacks.

You must know all those bit-hacks that are listed on some page? Well, they used to be good, but if you look at it on current processors, you'll notice that the normal versions are just as fast nowadays, and often enough even faster. So don't use them, unless you actually try it and make sure it's faster.

**iMalc** · 04-09-2010

No the problem is that you've misinterpreted the code. There is not supposed to be any floats whatsoever in the program, just a few precalculated integer constants.
Admittedly, it is totally unclear what inverse(7) means in that code (it's just some other integer constant).
I should have time to try it out myself later.

**iMalc** · 04-09-2010

Okay, I worked out what inverse(7) is for this example. It is 2454267027 which comes from:

Code:

ceil((((1<<32) % 7)<<32) / 7)

613566756 is

Code:

floor((1<<32) / 7)

Try the following code:

Code:

#include <iostream>
#include <cmath>

int main()
{
	double total = 0;
	for (unsigned int i = 0; i < 100000000; ++i) {
		if (i * 2454267027 <= 613566756)
			total += i;
	}
	std::cout << total << std::endl;
}

That should give the same answer and be faster than the i%7 version. Note that a great deal of the time will probably be spent on the total += i; line because that requires an int to double conversion. If you really want to see how much quicker it is, you'll need to change i and total to unsigned int in both versions.

Note that for this hack for i%7, it only works up to 858993461 (0x33333335)
It can be tested basically like this:

Code:

for (unsigned int i=0; i<=858993461; ++i) {
    bool t1 = (i % 7) == 0;
    bool t2 = i * 2454267027 <= 613566756;
    assert(t1 == t2);
}

**iMalc** · 04-09-2010

Appliying this technique to the original mod 12 problem. The following is one solution to if (n % 12 == 0):

Code:

if (!(n & 3) && n * 1431655766 <= 1431655765)

The extra !(n & 3) test is because this technique only works on odd numbers, so it's actually doing a test for n%3 in that multiplication, not n%12
Note that it works only up to 2147483647 (0x7FFFFFFF), and it's up to you to test the performance difference.

I just realised that you actually need the result of the modulus rather than just testing it against zero, so I still recommend going through both of the above articles, and working it out from that.

**EVOEx** · 04-10-2010

@iMalc:
The results still differ for me (I made it total++ and made total an integer). But, granted, the normal version was 50% slower than the hack.

**iMalc** · 04-10-2010

Originally Posted by EVOEx

@iMalc:
The results still differ for me

Impossible! My units tests prove conclusively that they give the same answers all the way up the the limits I stated:

Code:

TESTMETHOD(TestMod7)()
{
	for (unsigned int i=0; i<0x33333336; ++i) {
		bool t1 = (i % 7) == 0;
		bool t2 = i * 2454267027U <= 613566756U;
		ASSERT(t1 == t2);
	}
}

TESTMETHOD(TestMod12)()
{
	for (unsigned int i=0; i<0x80000000; ++i) {
		bool t1 = (i % 12) == 0;
		bool t2 = !(i & 3) && i * 1431655766U <= 1431655765U;
		ASSERT(t1 == t2);
	}
}

I've also checked, that the U makes no difference, and there's no undefined behaviour here either, since unsigned overflow behaviour is defined.
Works on VS2008 and MinGW 5.1.6.
If the above fails on your compiler then I'd be inclined to start digging into assembly and figuring out what is wrong with the compiler, though perhaps you simply copied it down wrong.

**EVOEx** · 04-10-2010

Ah, of course, I didn't have i unsigned. Nor the integer constants. When I changed that to unsigned it did work properly.
35% speed increase as well. Nice find ;-).

Thread: Modulo 12

Thread Tools

Search Thread

Display

Modulo 12

Similar Threads

Nearly finished, but a single problem remains.

Mesmerizing Array problem

How to search an array for certain numbers and count them.

Is this me or Windows?

Can't figure out why?