As I see it, it is better to waste memory than do the conversion every time.
The things this code might be working with bitmaps as large as 4096X4096 pixels so that is a lot of memory, like an array size over 16 million. + the bitmap is already laid out in a good format for checking neighbors to values which will be useful.
I will test both ways though to see what kind of speed hit the conversion produces. Might be negligable compared to other calculations I do.
You have to test what is better. I recently have made tests about these kind of things. And I have to tell you that it really depends. There is a trade-off between calculations and memory and it can even depend from system to system. We don't have the whole code to say anything more.
Too much casting.Code:float r = (float) ((int) (val & 0x000000FF) >> 0) * div;
val is int. div is float. The result is float. So no casting needed at all. When you have int * float the int is casted to a float. In any case you wanted this:
or simplerCode:float r = (float)((float) ((val & 0x000000FF) >> 0) * div);
An obvious optimization is:Code:float r = ((val & 0x000000FF) >> 0) * div);
The f will make a division with floats.Code:float r = ((val & 0x000000FF) >> 0) / 255f);
Also the & 0x000000FF means that you keep the last 8 bits. Thus 1 byte. So yeah, since you have a byte it makes no sense. Just emit it
Thanks, yes all the casting may not be necessary.
I don't agree with the division optimization though. Division is a lot slower than multiplying so since division will happen often using the same value it is better to calculate it as 1.0f / 255.0f once(outside the loop) and resuse that for multiplication.
You are probably right on that.
The other thing is try to use the float as an array rather than using bit wise operations. Like:
This might be faster, might be slower. I would expect it to be a little faster. Try it.Code:unsigned char* ptr = (unsigned char*)&val; float r = val[0] * div; float g = val[1] * div; float b = val[2] * div; float a = val[3] * div;
Any decent compiler would transform a division-by-constant to a multiplication-by-inverse-constant anyway.
MagosX.com
Give a man a fish and you feed him for a day.
Teach a man to fish and you feed him for a lifetime.
It's still not clear why you want to do so much arithmetic on several million pixels.
Why would working with floats be slower than working with quadruples of 4 chars? And why would checking neighbors be easier with 4 chars. If I understand your solution, you will have to look at all 4 chars for each comparison, since (for example) [255, 0, 0, 0] will be closer to [0,255,255,255] than [0,0,0,0] is to [0,0,0,10]. Correct?
If you are starting out with float values to begin with, and are primarily trying to save space, why is EVOEx's solution not preferable? The bitmap can be an array of unions instead of an array of 4-tuples of chars. Initially you would store the float values there. Do whatever processing you need to do on them, and do only a single conversion at the end, storing the float in each pixel with the 4 chars to display the image?
What's wrong with that?
the one i posted here on page 3 is probably faster
http://cboard.cprogramming.com/showp...6&postcount=39
this still seems like madness though. why can't you just cast it??
Indeed, and in fact I was relying on this for the code I provided. Interestingly enough though, whether VS2005 and up do this sometimes depends on the floating point consistency model selected for the project. I mean for powers of two it'll surely convert to a multiplication, but for something like dividing by 3 it probably wont use a multiplication by 1/3rd for the precise model, because that can't be represented exactly, whereas 3 can, and the multiplication result might differ by a couple of least-significant bits in the significand.
Yes division is certainly a lot slower than even 3 multiplications. Any reduction in the number of divisions performed has got to be a win. There's no harm in explicitly doing the optimisation of reducing the divides by hand here.
Note that with the method I posted, you don't need any premultiplication step. It should be able to operate on whatever range of values your float initially contains, and maintains accuracy for small and large values. By all means use whatever turns out to be fastest though
Minor optimisation, always do the bitmasks after a right shift:It takes less bytes of machine code to represent smaller integer constants. And leaving out the last bitmask is safe so long as val is positive, hence I've made it unsigned to be sure. I'd also leave out the zero-shift even though it doesn't look as pretty and the compiler would have generated the same code anyway, but that's just me.Code:float thickness = 678460.545f; unsigned int val = (unsigned int) (thickness * 1000.0f); //Gives me three decimals which is enough float div = 1.0f / 255.0f; float r = (val & 0xFF) * div; float g = ((val >> 8) & 0xFF) * div; float b = ((val >> 16) & 0xFF) * div; float a = ((val >> 24) * div;
Last edited by iMalc; 12-21-2008 at 01:03 PM.
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"
couldn't you also bitshift by 8 instead of dividing by 256?
MagosX.com
Give a man a fish and you feed him for a day.
Teach a man to fish and you feed him for a lifetime.
because they've made it floating point multiplication? i guess the real answer is that they can't do that because that would shift all the data off the end of the end, and so they'd get nothing.
this whole thing seems odd to me tho. seems like a lot of trouble to somewhat optimize an inherently inefficient system.
Last edited by m37h0d; 12-21-2008 at 01:41 PM.
You have effectively removed over 2 million colors by selecting int for vv. Colors are unsigned int which gives you the full range of over 4 million colors in 32-bit color. Negative r,g,b values do not make sense and will most likely result in some color inversion. Your version will overflow the data type you have selected.byte b1 = (byte) ((int)(r * 255.0f));
byte b2 = (byte) ((int)(g * 255.0f));
byte b3 = (byte) ((int)(b * 255.0f));
byte b4 = (byte) ((int)(a * 255.0f));
int vv = 0;
vv += (int) ((b1 & 0x000000FF) << 0);
vv += (int) ((b2 & 0x000000FF) << 8);
vv += (int) ((b3 & 0x000000FF) << 16);
vv += (int) ((b4 & 0x000000FF) << 24);
float thickness2 = vv * 0.001f;
Wow. Long thread.
As it seems to have been repeated quite a lot, it's memory overhead of using two arrays vs. processing overhead of casting (even if it's some custom cast). Well? If you've chosen to let the CPU take the heat, then there are many solutions here (i.e. last six pages). However, I suggest you use two arrays. What's a factor of two among programmers?
Last edited by CodeMonkey; 12-22-2008 at 12:46 AM. Reason: grammar
"If you tell the truth, you don't have to remember anything"
-Mark Twain