Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
Yeah I know what you mean. I remember in the DOS days you could hand optimize a program and see a significant difference in performance, nowadays you are lucky to even be able to measure the difference in cpu ticks, let alone actual calculable time.
i'm wondering the relative speed between that and swab, or some other method.
Ok, just to recap here now:
By using SSE on a intel 64bit processor under Vista 64 I can do:
2 * 64bit variables for each SSE register, and there is 8 registers so it will be( 2 * 32bit) * 8
You are right. In the article I posted he does the bit shifting for two pixels every time he draws theme. I will try to have a doublePixel array for my data.ight. This is getting a bit complicated, but if you really want to do math on multiple pixels at a time and get some REAL benefit, you need to think about how you organize your pixels first. Make sure they are stored in such a way that you can easily just load 2 pixels in one go. One way to do that is to have an array that holds the pixel values.
Why is it so painful and how do I work around it?Next, we should probably use SSE (or MMX if you want to run on really old processors) for efficiency. Trying to make two 32-bit integers work right in a 64-bit integer is going to be a pain. SSE has nice compartmentalization of the data, so there's no problem with one 32-bit number overflowing and contaminating the next one, for example.
I assume the same problem applies to 2 x 16 bit in a 32bit as well? I am going to use the article as a guide as I really out on deep water here.
Portability is not really a problem for me. As long as it runs on my PC im happy. Its not like im making the new GTA6 hereUsing intrinsics for SSE operations is a way to make it work without having to know inline assembler, but it's just as unportable as inline assembler, and it's, in my experience, generating pretty poor code, since each intrinsic call is treated as a separate unit of calculation, so there's no effort from the compiler to actually keep data in the same register from one function to the next, for example.
I have posted my algorithm before and I think it is as good as it can get. To be honest I really enjoy learning about this kind of stuff. I never knew that there was such as thing as MMX or SSE that could speed things up that much.Of course, only AFTER you have confirmed that the code you are working on can't be improved by other means. Writing inline assembler (or assembler functions) should always be the last resort. [Although I find it fun to write assembler, so I will jump in head first to solve problems that way, rather than find a better algorithm first].
Had a interesting guest lecture today about general purpose GPU programming. Really interesting stuff there as well. Never knew that you could write pixel and vertex shaders to calculate fluid dynamic.
Im not sure, but I think bitshifting is faster.nt64 twoPix = rpix+0x100000000*(int64)lpix
rpix | ((int64)lpix<<32) ?
I really appreciate all the inputs and information you people have posted. My university should hire several of you, as I get a lot more (and better) help from here then at school.
Last edited by h3ro; 04-24-2008 at 10:08 AM. Reason: 64, not 32. Thanks Elysia
Sorry about that. I meant to write 64. For some reason I believed 32 * 2 = 128 today :P