Essentially I'm doing an assignment of a loop with 9973 integers in an array (closest prime to 10,000 so dual loops are needed), adding them up, and doing this 200,000 times. It's an assignment about unrolling and efficiency in building loops.
Just to say, we're specifically not allowed to have the compiler optimize.
All that said, we have different targets for different grades, and while I've gone from 16.5 seconds (starting code) to about 7.7-7.9 seconds, I'm still in need of shaving several seconds. I'm hoping I can get down to 2-3 (as that approaches his records.)
(Just for fun, those are averages, and my lowest code so far on it's best run is ~7.1)
My problem is in the limitations of what code we can change, and my stubbornness to make my code robust, so that ANY size of an array may be used, I'm stuck with a modulus; essentially a division running 200,000 times. If anyone knows clock cycles, you know a division is about 30 times worse than additions or bit-wise operations.
So with all of that out of the way
Question
I've heard you can use the logical AND to turn moduli of the form (2^n) into bitwise functions of the form ((2^n)-1). I.E. You could AND the ARRAY_SIZE with '7' (if doing 8 unrolls) and get the remainder.
I tried simply doing: efficientRemainder = ARRAY_SIZE && 7 (and similar), but this is quite wrong. Or the source is incorrect.
If I could have any advice on either getting this bitwise operation to work, or another way to avoid this modulus with robust code, I'd appreciate it. I might try a Do While loop to get rid of it, but it's easy to start accessing unused or incorrect memory when you're unrolling a Prime. Modulus or similar was my first thought, and while it works, I fear that it is the majority of my overhead.