Hmm... you know, this is kinda processor-level stuff.. but here goes.
Addition is trivial because the worst you can do is add two eight bit values to get a nine bit value. So just add the two values and detect overflow (I have no idea of how one would do this in C...), carry the overflow to the next 8-bits, repeat.
Subtraction is addition.
Multiplication is where it gets funky... I'm inclined to say use shift-addition, but suspect that it might be better to just use 8-bit multiplication with a lot of funky additions. This also depends on how your computer would handle 8-bit multiplication. Typically, it will allow you to multiply two 8-bit values to obtain a 16-bit value, the two halves of which would be stored in seperate registers.
Code:
#define SHIFTVAL 8
#define MASK 0xFF
short a, b; // 16-bit multiplicands
char c, d; // 8-bit handlers
char result[4] = {}; // 32-bit result, initialized to 0.
int i; // Count
char carry = 0; // carry bit.
for (i = 4; i > 0; i--)
{
mult8(a & MASK, b & MASK, &c, &d);
carry = oFlowAdd (&result[i], c + carry);
carry = oFlowAdd (&result[i-1], d + carry);
a >>= SHIFTVAL
b >>= SHIFTVAL
}
return result;
Of course, oFlowAdd is a function I imagined that would prolly have to be implemented in assembly to be efficient. Really, this does belong in the realm of assembly/microprocessor design.