Two problems there.
Clamping using modulus
You really do not have to clamp the RGBs, just ensure that you never pass a value larger than 31 (green will wrap) to your pixel function. But just in case you want to clamp them:
Modulo is slower than using logical AND with powers of 2.
Example: to computer pixel offset
pixeloffset % 65536 -> slower
pixeloffset & 0xFFFF -> faster same result
Don't compute those colors on every pixel. Use tables.
Now for color all you do is:
typedef unsigned int WORD;
for (int i=0;i<32;i++)
grn[i]=i<<5; //depending on card
This is assuming you have computed the correct pixeloffset/bank.
Inline asm looks a bit different than actual MASM,but same idea.
You could store all color values as WORDs in a table, but accessing it would be slower than accessing 3 one dimensional arrays. It would also incur a few more multiplies/adds which you don't have to do. However, all bitmaps should be stored as WORDs. Color would correspond to the WORD in the bitmap. All could be done is asm and would be very fast.
mov ax,0a000h //screen seg - for buffer move seg of buffer