efficient implementation of strassen's algorithm for matrix multiplication
I am trying to implement strassen's algorithm in C++ that does matrix multiplication.
I have implemented it but the results are not much better than that of the traditional approach of O(n3) complexity.
In fact as already know from smaller matrices the traditional approach outperforms strassen's algorithm.
I tried multiplying two matrices A and B. The performance that I have obtained is:
What I want to ask is that there is no significant improvement by the Strassan's algorithm. This I believe is because in my implementation of the Strassan's method there are a lot of memory accesses. One particular thing that I am doing is:
Input: A(512 x 512) and B(512 x 512)
Time (traditional): 7 seconds
Time (strassen's): 7 seconds
Input: A(1024 x 1024) and B(1024 x 1024)
Time (traditional): 64 seconds
Time (strassen's): 46 seconds
Input: A(2048 x 2048) and B(2048 x 20484)
Time (traditional): 529 seconds
Time (strassen's): 527 seconds
The multiplication for A(4096 x 4096) and B(4096 x 4096) is still running in my machine :(
I have a matrix A which is of type vector < vector <double> >
now I have to partition this matrix into 4 parts A11, A12, A21, A22. For now I am copying the original matrix A into 4 smaller matrices. Is there any way to avoid this copy process, by using references in some way? I tried many ways but could not get it to work.
the strassen algorithm is called recursively and takes as input three matrices A, B, C such that it can calculate C = A x B
now I have partition A into 4 parts A11 A12 A21 A22 such that each of them is of type matrix so that they can be input to the strassen algorithm again and I also do not want to copy and create them.
To rephrase, given a matrix of type vector <vector<double> > can I create a smaller matrix from it without copying it (by using references in some way).
I may have made this post kind of confusing. I would be grateful to explain more if someone is willing to help.