I'm writting some functions that need to multiply large matrices together. This is what I have so far for it. It works fine, but it runs to slow. I need to run this function a lot so it needs to run fast. What could I do to the following code to make it run faster, other than loop unrolling, cause I know how to do that I just didn't want to type it all out here.

Code:void mm(void) { int i,j,k; unsigned long sum; /* Multiply the two arrays and store the result in a third array */ for ( j=0; j<512; j++) for ( i=0; i<512; i++) { sum = 0; for ( k=0; k<128; k++ ) sum += ARRAY_4A[i][k] * ARRAY_4B[k][j]; ARRAY_RES[i][j] = sum; } }