Thread: Matrix and vector operations on computers

1. Matrix and vector operations on computers

I thought about posting this on the general c++ board, but as it relates to more theory, programming, and engineering in general instead of being specific to c++, i decided to post it here.

Many times as programmers we are presented with the problem of doing some operation on vectors and matrixes. Some examples of this is in relational algebra, when we have to do matrix multiplication, join operations, select operations, and project operations.

Some algorithms that use matrixes are Warshall's and Floyd's Algorithms, which are both of order n^3 and iterate through the matrixes doing either addition (in the case of Floyd) or ANDing the bits together (Warshall).

Another case of where matrixes can be used is collision detection. In a 2d game, if you want to do pixel perfect collision detection, you can create a bit matrix which is almost like an overlay to the sprite of a character. In the bit matrix, a 1 represents where the corresponding pixel on the sprite is filled in, and a 0 represents where the corresponding pixel on the sprite is supposed to be transparent, or empty. Then you can use AND operations and a nested loop to produce a resultant matrix and see if a collision occurred between two characters or two objects and also tell exactly where it occurred.

So all these uses for matrixes, and yet computers are not appropriately equipped with tools to do matrix operations!

In almost all of those algorithms and cases I mentioned above, nested loops are needed to do the operations and iterate through each of the matrixes to appropriately hit each element in the matrix.

Warshall's is O(n^3)
Floyd's is O(n^3)
ANDing two bit-matrixes together is O(n^2)
Doing a join operation is O(n^2)
matrix multiplication is O(n^2)

Now, would it not be easier to be able to say:

Matrix A = { some stuff };
Matrix B = { some stuff };
Matrix C = A & B;

Of course in C/C++/C# you could very easily overload the & operator and then you would be able to do that, but the fact is, when you overload the & operator, the underlying code in that function you are creating is still of order n^2, with two nested loops inside of it.

As far as I know, there is no hardware on the general market that exists to make matrix operations faster for computers. If there is any that private businesses use, or if there is any that I do not know of (do video cards have some built in???), please tell me.

But anyways, in the general sense there is no hardware available that I know of that makes matrix operations fast and easy.

Why not make a chip that can handle matrixes of up to a certain size, and then install that in computers, and then when the programmer knows he is doing some matrix operation within the bounds of that chip, he can send the matrixes to that chip and it will pump out the result in no time, with no nested loops required.

Now, there are a couple different cases. In one case you could just be sending bit matrixes to this chip, but in the other case you could be sending floating point numbers or integers, depending on the operation you are doing on the matrix.

Lets say for the moment that our chip can handle bitwise matrix operations between 2 different matrixes of max size 300x300. You could put the appropriate amount of inputs on the chip, send it through some AND circuits, and then output the appropriate matrix.

In the case of floating point numbers or integers, many more inputs would be required on the chip because each integer and float is 32 bits (and if we use double precision each one is 64 bits). That would greatly increase the number of inputs, and probably make the chip bigger, and an APU would also need to be put into the chip for each operation on the numbers, but that shouldn't be too hard considering the technology we have these days.

What do you think?

2. > Why not make a chip that can handle matrixes of up to a certain size
Well off you go, do it and find out.
Mostly it boils down to this - silicon real estate is expensive, you can only put so much down on a chip, and it all draws power whether you use it or not. No-one is going to blow 1M++ transistors on some esoteric problem which gets used by such a small minority of programmers for such a small minority of problems, when there are much better things like caches, branch prediction etc which benefit everyone all the time. Changes to the use of silicon are modelled against a wide range of code, and only changes which improve the overall performance get committed to real silicon.

> bitwise matrix operations between 2 different matrixes of max size 300x300.
That's 90K bits (over 10K bytes of information)
Load 2 lots of 10K of bitmap into the processor, 32 bits at a time. Do you imagine this isn't O(n^2)?
do some mythical "magic" in one clock tick
Store 1 lot of 10K of bitmap result back out to memory. This is O(n^2) as well.
But wait, searching the result matrix for the set bit which indicates a collision is O(n^2)
So all in all, the problem is still O(n^2)

Any half decent algorithm would have optimised the search to exit as soon as a match was found, there is simply no need to bitwise-and the entire matrix just to find out if two of them share the same bit in any position. An algorithm which can be further improved by carefully choosing which sides of the two bitmaps to start comparing first.

> As far as I know, there is no hardware on the general market that exists to make matrix operations faster
Sure there is - you just shell out lots of \$\$\$ on vector processor based machines such as cray
But you have to need to do that, like that is all you ever do, day in and day out.

3. Video cards do matrix transformations rather quickly. Modern APIs construct the matrices and the card transforms the vertices...given it supports hardware transforms and hardware vertex processing.

Any 2D array can be accessed as a 1D array. (row*width)+column or one of those variations(row/column major, column/row major etc) can be used. If the array is a power of 2...which it should be...the operation becomes a simple bit shift and an add.

4. I think part of the reason that there is no specific hardware for matrix operations is that a mathematical 'matrix' is an abstract mathematical entity. Few of the people that use matrices (actually none that I know) can prove *why* matrix operations even work (I've done quite a bit of work with matrices, even invented an algorithm using their properties to implement a sort of neural net that can resolve ideal paths in 3D pased on basic parameters, i.e closest distance). That said, it is always going to be broken down into a bunch of operations on a bunch of data types (int or float) whether the hardware "knows" it is doing this to a matrix at the electrical level or not.

SIMD operations and the paradigm of parallel programming optimize these types of operations greatly without having to waste circuit board space.

EDIT:
Video cards do matrix transformations rather quickly. Modern APIs construct the matrices and the card transforms the vertices...given it supports hardware transforms and hardware vertex processing.

Any 2D array can be accessed as a 1D array. (row*width)+column or one of those variations(row/column major, column/row major etc) can be used. If the array is a power of 2...which it should be...the operation becomes a simple bit shift and an add.
there is not, however, any special circuitry built into video cards that says 'this is a matrix, treat it as such'. It is still just a floating point unit saying 'this is a float, treat it as such'.

5. I think this is a really good topic though, and I think david does fundamentally have a very interesting idea.

1) why do you assume that hardware designed for matrix operations would be faster than the instructions used in today's traditional world of programming? The only real benefit I can see is getting rid of temporary variables and the increment operator in your nested for loops, otherwise it's performing the exact same operations as is put out from the assembly you would have otherwise generated. The only other types of optimizations possible would be things like, as I mentioned before, SIMD operations, and, better algorithms, unless there are other optimizations I missed which someone can mention

2) How would you take care of the discrepancies in sizes of matrices? There are potentially many different possible sizes of matrices, varying in the row and column dimension. What would the smallest matrix be?

3) Would the smallest matrix be the same size as a float anyway?

4) What would be the largest matrix size possible?

6. Originally Posted by Silvercord
1) why do you assume that hardware designed for matrix operations would be faster than the instructions used in today's traditional world of programming? The only real benefit I can see is getting rid of temporary variables and the increment operator in your nested for loops, otherwise it's performing the exact same operations as is put out from the assembly you would have otherwise generated. The only other types of optimizations possible would be things like, as I mentioned before, SIMD operations, and, better algorithms, unless there are other optimizations I missed which someone can mention
Well, I assume you have probably studied at least a lit about circuits and such. Basically think of stuff like selectors and other things that have multiple inputs, and depending what it is, have 1 output or several outputs. In such things, all the inputs go in relatively at the same time. In a selector you could have 4 inputs all going in at the same time, and then 2 more inputs telling it which bit to select, and boom, it immediately does it. It does not need to iterate through the bits until it gets to the correct one. It simply does some boolean operations and goes to the correct switch. The correct value then gets outputted out of the selector.

The same methodologies could be done with a chip designed for matrix math.

Lets go a little more specific and take a chip, for example, that is specifically designed to do a bitwise AND operation on two different matrixes. Grab the memory and send it through the pipeline and to the chip, and all the inputs reach the chip at once. Then each input is automatically ANDed with its corresponding bit in the other matrix (all of them are ANDed simulanteously). It would be very easy to AND them all simulaneously, as bit[0][0] from Matrix A could very easily be hardwired to be ANDed with bit[0][0] from Matrix B, and so on and so forth for the rest of the bits, and each pair of bits could have their corresponding AND gate. The result is then all outputted simulatenously (maybe stored in a temporary register, or several temporary registers), and then pushed down the pipeline again to memory where it is finally stored.

Originally Posted by Silvercord
2) How would you take care of the discrepancies in sizes of matrices? There are potentially many different possible sizes of matrices, varying in the row and column dimension. What would the smallest matrix be?
Well, there are a couple of options. Option A is you could just let the chip not worry about it, and then all resultant matrixes would be outputted at the max matrix size, according to what size the chip allows. Or, there is Option B, which would probably hinder speed, is give it parameters, and then you could use some sort of a selector to retrieve the proper bits in the result.

Originally Posted by Silvercord
3) Would the smallest matrix be the same size as a float anyway?
not sure...I havent worked this out much in my mind for integer and floating point matrixes....i have mostly thought about bit matrixes....

Originally Posted by Silvercord
4) What would be the largest matrix size possible?
That would certainly depend on the manufacturer, and also you could produce different types of these chips. Maybe you could have one chip which allows a max of 5x5 matrixes, another which is 100x100, another which is 300x300, and even another which is 1000x1000 or greater.

7. I know the basics about gates and the bare essentials of what goes on at the electrical level, but this is not saying much. I think this is very interesting and I'm glad you responded. As an aside, I think that whatever Salem says should be respected the most because he is a genius, albeit a cprogramming.com stoic. In fact, I am quite certain he is not human.

8. Well it is possible in hardware because it is possible in assembly. Any code on any computer can be done in hardware and if you do research on assembly language you will find that this is where it came from.

I'm sure that modern video cards even for the PC do in fact have circuits that already perform this math at break neck speeds.

We all know a simple transform is just a bunch of matrix concatenations. But I'm really not sure that thinking in terms of 2D matrices is the correct way to go. In memory it is just linear so to have hardware that operated differently wouldn't work.
And to do an AND on a matrix is very simple and very fast even in software - again treat the matrix as a 1D array in memory.

I'm really not sure what benefit any of this would have for computers. But if you are talking about hardware that was specifically designed for just 3D graphics then yes, perhaps a new scheme would be better. But the current one does just fine and to do what you are saying would also entail altering the entire circuit schematic and layout of the entire computer. The current video rendering scheme must fit with the computer architecture - namely byte addressable. A computer is not always going to be doing 3D...maybe its simply doing a report for work or browsing the net. I fail to see how matrices would help any of these mundane tasks.

So basically I think I'm saying what Salem already did. No one is going to make just a 3D math machine because computers are used for a lot more than just that. Silicon graphics probably has some very nice vector processors but for the PC we will have to live with what Nvidia and Radeon throw at us.

MMX comes very close to some of what you are saying albeit its not quite all about matrices but it does do SIMD. Unfortunately the GPUs of today are faster than MMX.

9. >A computer is not always going to be doing 3D...maybe its simply doing a report for work or browsing the net.

Matrixes are not only used for graphics. Take Warshall's, Floyd's, and Prim's algorithms for example. They are all related to finding shortest paths and possible paths and use matrixes quite heavily, and are all used quite a bit in networks.

10. Matrixes are not only used for graphics. Take Warshall's, Floyd's, and Prim's algorithms for example. They are all related to finding shortest paths and possible paths and use matrixes quite heavily, and are all used quite a bit in networks.
exactly, as Silvercord had mentioned about his little neural net pathfinder, they have lots of uses in everyday applications, as that pathfinder could be applied to a physical network, deciding which routes a packet must take to arrive the fastest and maintain integrity in its structure.

11. A number of people/companies have been fooling around with the idea of using GPUs (which are essentially vector processors, evolved from the chips that used to power supercomputers) for doing things other than graphics. Check this out: http://developers.slashdot.org/artic...id=152&tid=185

12. I read that, it's very very interesting to say the least.