No.What i am saying is that you have a code where you first copy, then do work and then copy again.
This is translated to a double for loop , a tri- for loop and another double for loop.The tri - for loop will be executed 4*4*4 = 64 times ,where the double for loop will be executed 4*4 = 16 times. So which one is that affects your performance more?The one that is executed more times than the other, thus the tri - for loop.
But myself i do not believe that these operations were about to take too long.
What about the functions mixcolumnslookup1 and mixcolumnslookup2 ? Maybe one of them is what makes your code too slow.
int added = mixcolumnslookup1(charstate[xorresult][mxcol])+mixcolumnslookup1(multmatrix[mxrow][xorresult]);
if (added > 0xFF)
added = added- 0xFF;
unsigned char final = added;
mxthenxor[xorresult] = mixcolumnslookup2(final);