I have been tuning for performance the matrix library, specially matrix multiplication. I have been using the well-known Strassen's algorithm to achieve sub-cubic complexity, but extending with zeroes the matrix to the next power of two order to match the hypothesis of the algorithm. Obviously, this method didn't work very well, but I took it as a starting point. From this point, my idea was to mix the standard multiplication with Strassen's idea, and to not expand the matrix unnecessarily.
After some work, I ended up extending the matrix to a square matrix of the next even order. Then apply one iteration of Strassen's algorithm. This process is iterated until certain order. Smaller matrices to this fixed order are multiplied using the definition. I have tried different switcher orders and benchmarking to see what's the best choice. These are benchmarks using switcher order k = 150. To understand it, note that multN means multiplication of square matrices of order N and that Definition and Strassen mixed mean that the multiplication has been done by definition or using my mixed algorithm respectively.
This benchmark table is hosted here.
Both take a very similar time for small entries, but, as the matrix grows, the difference gets bigger and bigger in favor of the mixed algorithm. This is a nice result that will be applied in the next release of matrix (0.2).