I tried to call cblas_dgemm from C++ code to do matrix multiplication. But with same matrices, the C++ code is much slower than matlab command C=A*B. Anybody know why? I thought matlab also uses BLAS.
Thanks!
This is a discussion on Blas within the C++ Programming forums, part of the General Programming Boards category; I tried to call cblas_dgemm from C++ code to do matrix multiplication. But with same matrices, the C++ code is ...
I tried to call cblas_dgemm from C++ code to do matrix multiplication. But with same matrices, the C++ code is much slower than matlab command C=A*B. Anybody know why? I thought matlab also uses BLAS.
Thanks!
Searching Wikipedia for BLAS and GEMM, the following sentences stands out:
"Heavily used in high-performance computing, highly optimized implementations of the BLAS interface have been developed by hardware vendors such as Intel and AMD, as well as by other authors"
"GEMM is often tuned by High Performance Computing vendors to run as fast as possible, because it is the building block for so many other routines. It is also the most important routine in the LINPACK benchmark. For this reason, implementations of fast BLAS library typically focus first on GEMM performance."
My guess is that Matlab, being expensive as it is, has some proprietary optimization under the hood.
cblas_dgemm does more than matrix multiplication (IIRC, it actually turns C into x*A*B + y*C or something similar, where x and y are scalars) so is almost certainly not ideally tuned for performance of a simple matrix multiplication (even if its implementation is tuned for each hardware architecture).
I therefore wouldn't bet that Matlab uses cblas_dgemm for a simple matrix multiplication - either some other more dedicated BLAS function is used, or something optimised by hand.
Even if Matlab does use cblas_dgemm for a simple matrix multiplication, it probably employs additional techniques designed to optimise performance for storage and access of matrices and vectors.