Ok, just to recap here now:
By using SSE on a intel 64bit processor under Vista 64 I can do:
2 * 64bit variables for each SSE register, and there is 8 registers so it will be( 2 * 32bit) * 8
Quote:
ight. This is getting a bit complicated, but if you really want to do math on multiple pixels at a time and get some REAL benefit, you need to think about how you organize your pixels first. Make sure they are stored in such a way that you can easily just load 2 pixels in one go. One way to do that is to have an array that holds the pixel values.
You are right. In the article I posted he does the bit shifting for two pixels every time he draws theme. I will try to have a doublePixel array for my data.
Quote:
Next, we should probably use SSE (or MMX if you want to run on really old processors) for efficiency. Trying to make two 32-bit integers work right in a 64-bit integer is going to be a pain. SSE has nice compartmentalization of the data, so there's no problem with one 32-bit number overflowing and contaminating the next one, for example.
Why is it so painful and how do I work around it?
I assume the same problem applies to 2 x 16 bit in a 32bit as well? I am going to use the article as a guide as I really out on deep water here.
Quote:
Using intrinsics for SSE operations is a way to make it work without having to know inline assembler, but it's just as unportable as inline assembler, and it's, in my experience, generating pretty poor code, since each intrinsic call is treated as a separate unit of calculation, so there's no effort from the compiler to actually keep data in the same register from one function to the next, for example.
Portability is not really a problem for me. As long as it runs on my PC im happy. Its not like im making the new GTA6 here
Quote:
Of course, only AFTER you have confirmed that the code you are working on can't be improved by other means. Writing inline assembler (or assembler functions) should always be the last resort. [Although I find it fun to write assembler, so I will jump in head first to solve problems that way, rather than find a better algorithm first].
I have posted my algorithm before and I think it is as good as it can get. To be honest I really enjoy learning about this kind of stuff. I never knew that there was such as thing as MMX or SSE that could speed things up that much.
Had a interesting guest lecture today about general purpose GPU programming. Really interesting stuff there as well. Never knew that you could write pixel and vertex shaders to calculate fluid dynamic.
Quote:
nt64 twoPix = rpix+0x100000000*(int64)lpix
Im not sure, but I think bitshifting is faster.
rpix | ((int64)lpix<<32) ?
I really appreciate all the inputs and information you people have posted. My university should hire several of you, as I get a lot more (and better) help from here then at school.
Thanks