Bilinear filter is pretty easy. You need to use fixed point math, lots of shifts and MMX.
I've had no experience with voxel renderes so I don't know how your pipeline works, but typically in a polygon renderer you're texture coordinates are scaled from 0 to 1 to the actual dimensions of the texture. At this time, you'll want to convert them to fixed point numbers.
Assuming you have a fixed point number in texture space, the algorithm is relatively simple.
The following code fragment is C++ friendly and isn't necessarily fast, though it is 100% accurate. I grabbed this from an image processor I wrote for Windows. I'll leave the details of converting it into assembly language up to you.
Code:
DWORD GetSubTexel( int x, int y )
{
const int h = (x & 0xff00) / 255;
const int i = (y & 0xff00) / 255;
x = x >> 16;
y = y >> 16;
const COLORREF cr1 = GetTexel( x + 0, y + 0 );
const COLORREF cr2 = GetTexel( x + 1, y + 0 );
const COLORREF cr3 = GetTexel( x + 1, y + 1 );
const COLORREF cr4 = GetTexel( x + 0, y + 1 );
const int a = (0x100 - h) * (0x100 - i);
const int b = (0x000 + h) * (0x100 - i);
const int c = (0x000 + h) * (0x000 + i);
const int d = 65536 - a - b - c;
const unsigned int R = 0x00ff0000 & (((cr1 >> 16) * a) + ((cr2 >> 16) * b) + ((cr3 >> 16) * c) + ((cr4 >> 16) * d));
const unsigned int G = 0xff000000 & (((cr1 & 0x00ff00) * a) + ((cr2 & 0x00ff00) * b) + ((cr3 & 0x00ff00) * c) + ((cr4 & 0x00ff00) * d));
const unsigned int B = 0x00ff0000 & (((cr1 & 0x0000ff) * a) + ((cr2 & 0x0000ff) * b) + ((cr3 & 0x0000ff) * c) + ((cr4 & 0x0000ff) * d));
return R|((G|B)>>16);
}