Hello!

I searched the forum on this topic. My problem is as follows: I wrote a C++ algorithm which involved segment trees and matrix product. I had a timelimit of 1.1 seconds.

In the "matrix product" function I reversed the order of the 'for'-s as follows:

It was implemented in a matrix class and A, B and C are in fact instances (didn't write all the code here)... The normal (and logical) order of the for-s was (i,j,k), but putting them this way (k,i,j), my code ran faster (worst case, from 1 sec -> 0.8 sec)...Code:`for (k = 0; k<n; ++k)`

for (i=0; i<n; ++i)

for (j=0; j<n; ++j)

C[i][j] += A[i][k] * B[k][j];

I would like to ask you why is this happening? Any permutation of (i,j,k) I would use, one array would be accessed "normally", one on one row at a time and one on one column at a time... so the cache is not working in some parts, but does in others.

I hope my English is not bad enough to make this post impossible to be followed! :)

Thank you!