i have to deal with some very large matrices; approximately 2^14 x 100.
a huge amount of the computational power involved is simply in managing the for loops to carry out the calculations, so it would be great to optimize these.
looping with template chains works wonderfully for smaller matrices, but these are obviously far too large for the depth limits of 512 & 2000 for inheritance and specialization, respectively.
if anyone has ideas for how to partition the work so that the compiler won't error out using template chains, or something more clever than simply explicit unrolling statements, i'd love to hear it.
thanks.