Two problems there.
Clamping using modulus
You really do not have to clamp the RGBs, just ensure that you never pass a value larger than 31 (green will wrap) to your pixel function. But just in case you want to clamp them:
Modulo is slower than using logical AND with powers of 2.
Example: to computer pixel offset
pixeloffset % 65536 -> slower
pixeloffset & 0xFFFF -> faster same result
Precompute
Don't compute those colors on every pixel. Use tables.
Code:
typedef unsigned int WORD;
WORD red[32];
WORD grn[32];
WORD blu[32];
for (int i=0;i<32;i++)
{
red[i]=i<<11;
grn[i]=i<<5; //depending on card
blu[i]=i;
}
Now for color all you do is:
Code:
WORD color=red[rvalue]+grn[gvalue]+blu[bvalue]
This is assuming you have computed the correct pixeloffset/bank.
Inline asm looks a bit different than actual MASM,but same idea.
Code:
asm
{
mov ax,0a000h //screen seg - for buffer move seg of buffer
mov es,bx
mov di,word(pixeloffset)
mov ax,color
stosw
}
You could store all color values as WORDs in a table, but accessing it would be slower than accessing 3 one dimensional arrays. It would also incur a few more multiplies/adds which you don't have to do. However, all bitmaps should be stored as WORDs. Color would correspond to the WORD in the bitmap. All could be done is asm and would be very fast.