I searched the forum on this topic. My problem is as follows: I wrote a C++ algorithm which involved segment trees and matrix product. I had a timelimit of 1.1 seconds.
In the "matrix product" function I reversed the order of the 'for'-s as follows:
It was implemented in a matrix class and A, B and C are in fact instances (didn't write all the code here)... The normal (and logical) order of the for-s was (i,j,k), but putting them this way (k,i,j), my code ran faster (worst case, from 1 sec -> 0.8 sec)...Code:
for (k = 0; k<n; ++k)
for (i=0; i<n; ++i)
for (j=0; j<n; ++j)
C[i][j] += A[i][k] * B[k][j];
I would like to ask you why is this happening? Any permutation of (i,j,k) I would use, one array would be accessed "normally", one on one row at a time and one on one column at a time... so the cache is not working in some parts, but does in others.
I hope my English is not bad enough to make this post impossible to be followed! :)