If you are programming fo 32-bit and using a compiler that supports MMX in inline asm or an assembler that does you could do this very fast.
I cannot see how that small of a rectangular bitmap on the screen could slow things down that much, even if you did use a transparent effect on it. This is a simple linear interpolation between the pixel that is there and the pixel you want to put there. Using linear interpolation you could get more of the background image if you preferred, or you could get more of the foreground or the super-imposed image.
short LI(short v1,short v2,double f1)
fild [v2] //integer load v2 to st(0)
fisub [v1] //integer subtract v1 from st(0)
fmul [f1] //float multiply f1 * st(0)
fiadd [v1] //integer add v1 to st(0)
fistp [final] //integer store in final,pop st(0)