# Thread: Combining several variables into one

1. Originally Posted by h3ro
I have read that c++ does not have a standard 64bit value, so is this simply impossible?
The problem is that the left-shift operator is defined such that the result is the basic integer type, so by shifting values past 32 bits, you are shifting them off the end, into nothing. Try explicitly casting the a, b, c variables to __int64 before doing the shift.

2. Code:
```	uint64_t test = 1 | (2 << 8) | (3 << 16) | (4 << 24) | (5ULL << 32) | (6ULL << 40) | (7ULL << 48) | (8ULL << 56);
cout << " newA = " << (test & 0xff) << endl;
cout << " newB = " << ((test >> 8) & 0xff) << endl;
cout << " newC = " << ((test >> 16) & 0xff) << endl;
cout << " newD = " << ((test >> 24) & 0xff) << endl;
cout << " newE = " << ((test >> 32) & 0xff) << endl;
cout << " newF = " << ((test >> 40) & 0xff) << endl;
cout << " newG = " << ((test >> 48) & 0xff) << endl;
cout << " newH = " << ((test >> 56) & 0xff) << endl;```
This code works for me.
But really, if you don't cast to appropriate type (or specify a constant like I did), you WILL get a warning about undefined behaviour (level 1 warning).

3. this also seems to work well. this combines the vars, but there's no reason you couldn't do it in reverse.

Code:
```        unsigned __int64 result=0;
unsigned int right=0xAAAAAAAA;

unsigned int left=0xBBBBBBBB;
swab((char *)&right,(char *)&result,sizeof(right));
swab((char *)&left,((char *)&result)+sizeof(right),sizeof(left));```

4. Also, using 64-bit values on a 32-bit processor (or a 64-bit processor running a 32-bit OS) is just making it slower - the processor deals with 64-bit numbers as two 32-bit numbers anyways (unless you use hand-coded SSE or MMX code).

--
Mats

5. Originally Posted by h3ro
What I want is calculating two screen pixels at the same time. Im not sure that I can do that with a struct?
Why not?

6. The more I read and learn about this, the more I see that there is a lot I dont understand.

Also, using 64-bit values on a 32-bit processor (or a 64-bit processor running a 32-bit OS) is just making it slower - the processor deals with 64-bit numbers as two 32-bit numbers anyways (unless you use hand-coded SSE or MMX code).

--
Mats
The program will run on a 64bits processor under Vista 64bit.
What would be better for me to look into, SSE or MMX?

From Wikipedia I read that SSE added eight new 128bit registers. What does this really mean and why is that a good thing?

Does it mean that I can do (2 x 64bit) x 8 in one cycle?

Why not?
I have no idea really. Im just guessing here. So this would be the same as the above code?
Code:
```struct bigVar
{
char a;
char b;
char c;
char d;
char e;
char f;
char g;
char h;
};```

7. Originally Posted by h3ro
I have no idea really. Im just guessing here. So this would be the same as the above code?
Code:
```struct bigVar
{
char a;
char b;
char c;
char d;
char e;
char f;
char g;
char h;
};```
Never mind. You're right, doing this with a struct would not be optimal.

The truth is, this kind of optimization is best done with inline assembly.

Barring that, you can use the windows LOBYTE family of macros wich may be implemented as compiler flags, allowing the compiler to use partial registers to access those variable parts. This is particularly likely if Visual Studio is your IDE of choice. It also might be implemented as inline assembly.

You could use a combination of bit shifting and Macros. That way you won't be limited by the macro's inability to get byte from higher order parts of the word, but will be able to benefit from the trivial implementation of those macros when you can.

8. Originally Posted by h3ro
The more I read and learn about this, the more I see that there is a lot I dont understand.

The program will run on a 64bits processor under Vista 64bit.
What would be better for me to look into, SSE or MMX?

From Wikipedia I read that SSE added eight new 128bit registers. What does this really mean and why is that a good thing?

Does it mean that I can do (2 x 64bit) x 8 in one cycle?

I have no idea really. Im just guessing here. So this would be the same as the above code?
Code:
```struct bigVar
{
char a;
char b;
char c;
char d;
char e;
char f;
char g;
char h;
};```

why not

Code:
`char twoPixels[8]`
a->0
b->1

etc...

?

i don't know much about speed optimization, so i am asking as much as suggesting here...

9. Originally Posted by h3ro
The program will run on a 64bits processor under Vista 64bit.
What would be better for me to look into, SSE or MMX?
SSE is always better for crunching numbers. You can look into intrinsics. They allow you to use SSE without assembly.

From Wikipedia I read that SSE added eight new 128bit registers. What does this really mean and why is that a good thing?

Does it mean that I can do (2 x 64bit) x 8 in one cycle?
Yes. You can computer 8 64-bit numbers in parallel.

Originally Posted by m37h0d
why not

Code:
`char twoPixels[8]`
a->0
b->1

etc...

?

i don't know much about speed optimization, so i am asking as much as suggesting here...
This would mean putting each char into its own register to perform instructions on them which would be a waste. It's better to use larger data.

I'm by no means an expert, but this is pretty basic.

10. Right. This is getting a bit complicated, but if you really want to do math on multiple pixels at a time and get some REAL benefit, you need to think about how you organize your pixels first. Make sure they are stored in such a way that you can easily just load 2 pixels in one go. One way to do that is to have an array that holds the pixel values.

Next, we should probably use SSE (or MMX if you want to run on really old processors) for efficiency. Trying to make two 32-bit integers work right in a 64-bit integer is going to be a pain. SSE has nice compartmentalization of the data, so there's no problem with one 32-bit number overflowing and contaminating the next one, for example.

Using intrinsics for SSE operations is a way to make it work without having to know inline assembler, but it's just as unportable as inline assembler, and it's, in my experience, generating pretty poor code, since each intrinsic call is treated as a separate unit of calculation, so there's no effort from the compiler to actually keep data in the same register from one function to the next, for example.

--
Mats

11. >>so there's no effort from the compiler to actually keep data in the same register from one function to the next

Indeed, matsp is dead on about that. And mysteriously in the ms pages it says it tries to preserve assembler registers. However, it does so with respect only to the scope of whats within one function. This is a task best suited for writing the whole function in assembler, in my opinion...

12. Originally Posted by master5001
>>so there's no effort from the compiler to actually keep data in the same register from one function to the next

Indeed, matsp is dead on about that. And mysteriously in the ms pages it says it tries to preserve assembler registers. However, it does so with respect only to the scope of whats within one function. This is a task best suited for writing the whole function in assembler, in my opinion...
Of course, only AFTER you have confirmed that the code you are working on can't be improved by other means. Writing inline assembler (or assembler functions) should always be the last resort. [Although I find it fun to write assembler, so I will jump in head first to solve problems that way, rather than find a better algorithm first].

--
Mats

13. I never find assembly to work as I want it to >_<
Plus it seems awfully complicated to have put to put stuff into certain registers to perform instructions instead of pointing the instruction to what registers the data is stored in and where to put the result or just sending data without the need of a register.
For example mul is such an opcode. It should be more like add.

14. Originally Posted by Elysia
I never find assembly to work as I want it to >_<
Plus it seems awfully complicated to have put to put stuff into certain registers to perform instructions instead of pointing the instruction to what registers the data is stored in and where to put the result or just sending data without the need of a register.
For example mul is such an opcode. It should be more like add.
This goes back to the original x86 design - because multiply at the time was quite a rare operaiton (many things can be done with shifts instead, and when your multiply operation takes 100 clock cycles, and a shift takes 4-8, you can do quite a few shifts before you worry about using multiply - and code was written in assembler anyways, so the programmer could plan to use the right register in the first place.

Other architectures have less limitations, although for example PDP-11 requires (should that be required?) that the DIV or MUL operation uses an even register for the destination, as it used reg+1 to store the remainder or high part from the operation.

But most operations, in 32/64-bit x86, are pretty flexible compared to the old days. It's only a handful of instructions that actually only work with a particular set of registers:
MUL, DIV, MOVS/LODS/STOS/INS/OUTS, LOOP[Z, NZ], shift/rotate with non-constant amount, segment register loads.
These are relatively rare in comparison to other operations.

For SSE operations, it's simpler again, since they are all orthogonal. All SSE registers can be source or destination for all operations, and you can even use a memory location as source or destination (but not both at the same time).

--
Mats

15. I agree about the assembler being fun to work with bit. Though I will agree typically a compiler's code will be more optimized than what you or me would write off hand, I can also say that if you are after making something as optimal as humanly possible, hand optimized assembler is way better than what any machine can do. But I am not one of those engineers who can tell you exactly how many ticks it takes to execute a piece of code just by reading an instruction set. So alas, this human will not always be able to beat a machine *sigh*

Popular pages Recent additions