Combining several variables into one

Printable View

Show 80 post(s) from this thread on one page

04-24-2008
matsp

Quote:

Originally Posted by master5001

I agree about the assembler being fun to work with bit. Though I will agree typically a compiler's code will be more optimized than what you or me would write off hand, I can also say that if you are after making something as optimal as humanly possible, hand optimized assembler is way better than what any machine can do. But I am not one of those engineers who can tell you exactly how many ticks it takes to execute a piece of code just by reading an instruction set. So alas, this human will not always be able to beat a machine *sigh*

Modern machines are so complex that it can be near impossible to say how long it takes to execute a sequence of instructions. It's easier on older processors, where instructions where strictly in order, and serial. Nowadays, they execute out of order and in parallel, so it's much harder to tell how long it will take. The key is to profile A LOT, and of course remove instructions that are not needed, and avoid re-calculating results that have already been calculated etc.

--
Mats
04-24-2008
master5001

Yeah I know what you mean. I remember in the DOS days you could hand optimize a program and see a significant difference in performance, nowadays you are lucky to even be able to measure the difference in cpu ticks, let alone actual calculable time.
04-24-2008
vart

Quote:

Originally Posted by master5001

Yeah I know what you mean. I remember in the DOS days you could hand optimize a program and see a significant difference in performance, nowadays you are lucky to even be able to measure the difference in cpu ticks, let alone actual calculable time.

You just use nowadays tools to achieve proper results...
04-24-2008
m37h0d

Quote:

Originally Posted by Elysia

This would mean putting each char into its own register to perform instructions on them which would be a waste. It's better to use larger data.

I'm by no means an expert, but this is pretty basic.

perhaps basic, but certainly not self-evident! good to know, thank you.

so...

what about

int32 lpix;
int32 rpix;

int64 twoPix = rpix+0x100000000*(int64)lpix

?
04-24-2008
Elysia

What exactly are you asking?
04-24-2008
m37h0d

i'm wondering the relative speed between that and swab, or some other method.
04-24-2008
h3ro

Ok, just to recap here now:

By using SSE on a intel 64bit processor under Vista 64 I can do:
2 * 64bit variables for each SSE register, and there is 8 registers so it will be( 2 * 32bit) * 8

Quote:

ight. This is getting a bit complicated, but if you really want to do math on multiple pixels at a time and get some REAL benefit, you need to think about how you organize your pixels first. Make sure they are stored in such a way that you can easily just load 2 pixels in one go. One way to do that is to have an array that holds the pixel values.

You are right. In the article I posted he does the bit shifting for two pixels every time he draws theme. I will try to have a doublePixel array for my data.

Quote:

Next, we should probably use SSE (or MMX if you want to run on really old processors) for efficiency. Trying to make two 32-bit integers work right in a 64-bit integer is going to be a pain. SSE has nice compartmentalization of the data, so there's no problem with one 32-bit number overflowing and contaminating the next one, for example.

Why is it so painful and how do I work around it?

I assume the same problem applies to 2 x 16 bit in a 32bit as well? I am going to use the article as a guide as I really out on deep water here.

Quote:

Using intrinsics for SSE operations is a way to make it work without having to know inline assembler, but it's just as unportable as inline assembler, and it's, in my experience, generating pretty poor code, since each intrinsic call is treated as a separate unit of calculation, so there's no effort from the compiler to actually keep data in the same register from one function to the next, for example.

Portability is not really a problem for me. As long as it runs on my PC im happy. Its not like im making the new GTA6 here

Quote:

Of course, only AFTER you have confirmed that the code you are working on can't be improved by other means. Writing inline assembler (or assembler functions) should always be the last resort. [Although I find it fun to write assembler, so I will jump in head first to solve problems that way, rather than find a better algorithm first].

I have posted my algorithm before and I think it is as good as it can get. To be honest I really enjoy learning about this kind of stuff. I never knew that there was such as thing as MMX or SSE that could speed things up that much.

Had a interesting guest lecture today about general purpose GPU programming. Really interesting stuff there as well. Never knew that you could write pixel and vertex shaders to calculate fluid dynamic.

Quote:

nt64 twoPix = rpix+0x100000000*(int64)lpix

Im not sure, but I think bitshifting is faster.

rpix | ((int64)lpix<<32) ?

I really appreciate all the inputs and information you people have posted. My university should hire several of you, as I get a lot more (and better) help from here then at school.

Thanks
04-24-2008
Elysia

Quote:

Originally Posted by h3ro

By using SSE on a intel 64bit processor under Vista 64 I can do:
2 * 32bit variables for each SSE register, and there is 8 registers so it will be( 2 * 32bit) * 8

Nope. Remember that they're 128 bits! That means you can stuff 4 32-bit values inside!
04-24-2008
h3ro

Sorry about that. I meant to write 64. For some reason I believed 32 * 2 = 128 today :P
04-24-2008
laserlight

Quote:

Sorry about that. I meant to write 64. For some reason I believed 32 * 2 = 128 today :P

You could just pretend you saw << instead of * ;)

Show 80 post(s) from this thread on one page