I just explained this to someone at work last week who was doing a very similar thing with graphics.
Integer divisions lose information (the remainder) but integer multiplications don't (as long as they don't overflow).
One trick then is to perform multiplications before the divisions.
In your case (and the one I described to someone else) your multiplications are being done through successive additions. Those are your additions to pixelcounter.
Instead of performing multiplications via successive additions, you can just perform a multiplication:
E.g. Replace this line
with just this
and remove the no-longer-necessary code dealing with pixelcounter.
memcpy(&pixeldata, (buffer +
Now this will generate correct results and you can reapply the optimisation techniques you know such as loop-hoisting to get a faster solution from there.
It can be done without the divisions inside the loops if you do some research, but don't trying doing that before you know how to get it to work at all.
So yeah this is also a lesson in "get it working first, then get it fast".
Note that in a final solution I would recommend getting rid of the memcpy and just accessing the three bytes consecutive from the address calculated. This removes endian issues and removes one cause of the slowdown.
Did you know that you can get a much better image by averaging groups of pixels instead of just skipping them? You can also perform bilinear filtering or similar to improve the image quality. There's also http://en.wikipedia.org/wiki/Lanczos_resampling if you get really enthusiastic!