I thought about posting this on the general c++ board, but as it relates to more theory, programming, and engineering in general instead of being specific to c++, i decided to post it here.

Many times as programmers we are presented with the problem of doing some operation on vectors and matrixes. Some examples of this is in relational algebra, when we have to do matrix multiplication, join operations, select operations, and project operations.

Some algorithms that use matrixes are Warshall's and Floyd's Algorithms, which are both of order n^3 and iterate through the matrixes doing either addition (in the case of Floyd) or ANDing the bits together (Warshall).

Another case of where matrixes can be used is collision detection. In a 2d game, if you want to do pixel perfect collision detection, you can create a bit matrix which is almost like an overlay to the sprite of a character. In the bit matrix, a 1 represents where the corresponding pixel on the sprite is filled in, and a 0 represents where the corresponding pixel on the sprite is supposed to be transparent, or empty. Then you can use AND operations and a nested loop to produce a resultant matrix and see if a collision occurred between two characters or two objects and also tell exactly where it occurred.

So all these uses for matrixes, and yet computers are not appropriately equipped with tools to do matrix operations!

In almost all of those algorithms and cases I mentioned above, nested loops are needed to do the operations and iterate through each of the matrixes to appropriately hit each element in the matrix.

Warshall's is O(n^3)
Floyd's is O(n^3)
ANDing two bit-matrixes together is O(n^2)
Doing a join operation is O(n^2)
matrix multiplication is O(n^2)

Now, would it not be easier to be able to say:

Matrix A = { some stuff };
Matrix B = { some stuff };
Matrix C = A & B;

Of course in C/C++/C# you could very easily overload the & operator and then you would be able to do that, but the fact is, when you overload the & operator, the underlying code in that function you are creating is still of order n^2, with two nested loops inside of it.

As far as I know, there is no hardware on the general market that exists to make matrix operations faster for computers. If there is any that private businesses use, or if there is any that I do not know of (do video cards have some built in???), please tell me.

But anyways, in the general sense there is no hardware available that I know of that makes matrix operations fast and easy.

Why not make a chip that can handle matrixes of up to a certain size, and then install that in computers, and then when the programmer knows he is doing some matrix operation within the bounds of that chip, he can send the matrixes to that chip and it will pump out the result in no time, with no nested loops required.

Now, there are a couple different cases. In one case you could just be sending bit matrixes to this chip, but in the other case you could be sending floating point numbers or integers, depending on the operation you are doing on the matrix.

Lets say for the moment that our chip can handle bitwise matrix operations between 2 different matrixes of max size 300x300. You could put the appropriate amount of inputs on the chip, send it through some AND circuits, and then output the appropriate matrix.

In the case of floating point numbers or integers, many more inputs would be required on the chip because each integer and float is 32 bits (and if we use double precision each one is 64 bits). That would greatly increase the number of inputs, and probably make the chip bigger, and an APU would also need to be put into the chip for each operation on the numbers, but that shouldn't be too hard considering the technology we have these days.

What do you think?