Any decent compiler would transform a division-by-constant to a multiplication-by-inverse-constant anyway.
Printable View
Any decent compiler would transform a division-by-constant to a multiplication-by-inverse-constant anyway.
It's still not clear why you want to do so much arithmetic on several million pixels.
Why would working with floats be slower than working with quadruples of 4 chars? And why would checking neighbors be easier with 4 chars. If I understand your solution, you will have to look at all 4 chars for each comparison, since (for example) [255, 0, 0, 0] will be closer to [0,255,255,255] than [0,0,0,0] is to [0,0,0,10]. Correct?
If you are starting out with float values to begin with, and are primarily trying to save space, why is EVOEx's solution not preferable? The bitmap can be an array of unions instead of an array of 4-tuples of chars. Initially you would store the float values there. Do whatever processing you need to do on them, and do only a single conversion at the end, storing the float in each pixel with the 4 chars to display the image?
What's wrong with that?
the one i posted here on page 3 is probably faster
http://cboard.cprogramming.com/showp...6&postcount=39
this still seems like madness though. why can't you just cast it??
Indeed, and in fact I was relying on this for the code I provided. Interestingly enough though, whether VS2005 and up do this sometimes depends on the floating point consistency model selected for the project. I mean for powers of two it'll surely convert to a multiplication, but for something like dividing by 3 it probably wont use a multiplication by 1/3rd for the precise model, because that can't be represented exactly, whereas 3 can, and the multiplication result might differ by a couple of least-significant bits in the significand.
Yes division is certainly a lot slower than even 3 multiplications. Any reduction in the number of divisions performed has got to be a win. There's no harm in explicitly doing the optimisation of reducing the divides by hand here.
Note that with the method I posted, you don't need any premultiplication step. It should be able to operate on whatever range of values your float initially contains, and maintains accuracy for small and large values. By all means use whatever turns out to be fastest though
Minor optimisation, always do the bitmasks after a right shift:It takes less bytes of machine code to represent smaller integer constants. And leaving out the last bitmask is safe so long as val is positive, hence I've made it unsigned to be sure. I'd also leave out the zero-shift even though it doesn't look as pretty and the compiler would have generated the same code anyway, but that's just me.Code:float thickness = 678460.545f;
unsigned int val = (unsigned int) (thickness * 1000.0f); //Gives me three decimals which is enough
float div = 1.0f / 255.0f;
float r = (val & 0xFF) * div;
float g = ((val >> 8) & 0xFF) * div;
float b = ((val >> 16) & 0xFF) * div;
float a = ((val >> 24) * div;
couldn't you also bitshift by 8 instead of dividing by 256?
because they've made it floating point multiplication? i guess the real answer is that they can't do that because that would shift all the data off the end of the end, and so they'd get nothing.
this whole thing seems odd to me tho. seems like a lot of trouble to somewhat optimize an inherently inefficient system.
You have effectively removed over 2 million colors by selecting int for vv. Colors are unsigned int which gives you the full range of over 4 million colors in 32-bit color. Negative r,g,b values do not make sense and will most likely result in some color inversion. Your version will overflow the data type you have selected.Quote:
byte b1 = (byte) ((int)(r * 255.0f));
byte b2 = (byte) ((int)(g * 255.0f));
byte b3 = (byte) ((int)(b * 255.0f));
byte b4 = (byte) ((int)(a * 255.0f));
int vv = 0;
vv += (int) ((b1 & 0x000000FF) << 0);
vv += (int) ((b2 & 0x000000FF) << 8);
vv += (int) ((b3 & 0x000000FF) << 16);
vv += (int) ((b4 & 0x000000FF) << 24);
float thickness2 = vv * 0.001f;
Wow. Long thread.
As it seems to have been repeated quite a lot, it's memory overhead of using two arrays vs. processing overhead of casting (even if it's some custom cast). Well? If you've chosen to let the CPU take the heat, then there are many solutions here (i.e. last six pages). However, I suggest you use two arrays. What's a factor of two among programmers?
Another array was suggested, but shot down by OP, because it was too... memory consuming, I think.
Nevertheless, I think both ways should be tested and the one that performs the best picked.
I understand you are confused...lost...he is doing something unconventional.. what is he doing... what is he building.. we have the right to KNOW!
A bitmap should be used as a bitmap.. using a bitmap for calculating values? Blasphemy! It was never meant for that! It is bad design, polluting the conventions. What if everybody started doing as they pleased? What then? Yes it would be anarchy and we can't have that! You use two arrays now you hear... like everybody else. Don't go thinking you are special and optimize things.
Meanwhile in a small house in Sweden there sits a programmer, smiling in triumph as he looks down on his unusually optimized code.:)
You may find, however, that the memory saving is not worth the loss of performance in other places. Unless your image/map is hundreds of megabytes (or you have a system that has only a few megabytes of memory), you are most likely better off with two arrays. Saving memory is fine, but it comes at some sort of price. Particularly, modern processors "tag" the data that it's got for use as either floating point or integer data, and when you switch from one to the other, it often takes quite a few extra cycles to process the data.
--
Mats
Like I said I will try both ways, if it turns out that the conversion only takes one or two seconds extra overall I think it will be worth the saved memory.
This tool I'm working on may be used in an environment where a lot of bitmaps and memory are already used.
I said there may be 4096x4096 pixel bitmaps but it may very well be more than that. As an example Gollum used 20000x20000 pixel texures, that would be something like 1.6 GB of memory. Would be to bad if the guy working on something like that can't use my tool because of lack of memory.;)
I've read this thread three times, and I still am not completely sure I know what you've done here. You've taken a float, and stored in four "short floats" that only go to 255, and then you bring it back to a normal float later?
If that's so, then the "normal" way would require 4096x4096 comparisons, and 4096x4096 divisions. Your way would seem to require 4096x4096 comparisons, 4096x4096x5 divisions, and 4096x4096x4 multiplications (the extra / and * to take the float apart/put it back together). On the face of it that would seem to take 6-7 times as long, unless you've got that disassembling/reassembling down pat.
I don't know how much that would be offset by having to transfer from one array to another at the end, but I would be surprised if you come out ahead.
But I like surprises.
apparently, the float channels go from 0-1. at least that's how it's been implemented.
this is 20% faster on my machine. btw, to be fair, i also modified your code to reassign thickness rather than creating thickness2
Code:float thickness = 678460.545f;
int val = *(int*) &thickness; //truncating to 3 digits of precision is unnecessary
float div = 1.0f / 255.0f;
float r = (float) ((char) (val)) * div;
float g = (float) ((char) (val >> 8)) * div;
float b = (float) ((char) (val >> 16)) * div;
float a = (float) ((char)(val >> 24)) * div;
thickness = 0;
int result =
(((int)(255*a)<<24)&0xFF000000) |
(((int)(255*b)<<16)&0xFF0000) |
(((int)(255*g)<<8)&0xFF00) |
((int)(255*r)&0xFF);
thickness = *(float *)&result;
DrSnuggles, in the bits of code you posted, the r,g,b and a members of Color are public. Is that true in the actual class, or was that for illustration only?
Basically, when the processor loads data into memory, it will know if the data is intended for floating point, and preprocess it into floating point format. If it's "wrong", the gain from the preprocessing is lost, and there will be an extra step before the data can be used as integer data. Although perhaps not at the caching phase, but rather when using SSE instructions (for example) to switch between float and integer.
--
Mats
Hmm, I think I get what you are saying. Not completely, but it shows that float to integer isn't a good idea, except if you actually test it and have better results.
Googling I also found this article http://www.mega-nerd.com/FPcast/
that more or less has a faster float to integer conversion methods. So this should be a really helpful optimization for your code, since you will doing those a lot.
But, like tabstop, I haven't yet figured out what exactly the OP wants to do.
The idea of the optimization makes sense and should be tried, even if it fails. But these I don't get:
1) Why divide with 255 and then multiply with 255.
2) Why not store the float in r or g or b or a, since they are already float.
3) Why can't a pseudo code with the code be given. The loop where this conversions are happening.
Another note is that you are talking about huge numbers here, 1.6Gb!
Wouldn't it better to change the color class? Like using unsigned char instead of float for r, g, b, a? You will have then 400Mb. And since the bitmap is that large you might want to think how you iterate through it.
If you go from beginning to end then it won't be in the cache all the time. So since there will be swaps between main memory and cache, two bitmaps should do the same things faster. I have tried something similar and that is what I got
What I really don't get is this. You use floats for the colors and int for the thickness. You convert a float to int by multiplying. Then you convert that int to float again! If colors where int then you wouldn't have to cast int to float much less.
So, even if you this strange for me method, you should change something.
Not to mention that 24/32 bit color is NOT int. It is unsigned int. Go ahead and use int and watch what happens to the colors.
This is what I was getting at. Apparently the OP's using a Bitmap class that has a vector or array of Color objects, and he doesn't want to change the Color class. But in each of his examples, he accessed the r g b and a as public members. If they are public, he can bypass the Color class's mutator method & use a pointer to assign the "thickness" float directly to r (for example), and avoid all the encoding and decoding (and loss of precision).
He doesn't seem to want to answer questions. Oh well, he seems happy with his "optimization".
The color class is not written by me but it is something I have to work with. In my solution I used r,g,b,a freestanding as examples only. In the class those values go from 0.0 to 1.0 and only in steps between 0 and 255 so I can't store any float number there, only steps between 0 and 255. So assigning a value like 0.5f will result in a stored value of 0.4980392 which is (1.0 / 255.0) * 127.
The color class uses float to be able to store high dynamic range color data if needed. How the colors are stored in memory for the bitmap I'm not sure, in the case of not using high dynamic range I believe it is only the access function for the pixels that is returning floats and the memory is stored as bytes. That might not take up 1,6 GB but the important thing is that an additional array of floats in that case would.
Surely you must have something working by now?
What code have you used for the conversions, and how long does it take?
It is pointless to go through any more pages of discussion about theory. We need to see how it works in practice to be of any furtur help.
Well I have some interesting results.:)
I timed both ways in milliseconds. "Mid time" here stands for the first part of assigning values to the bitmap/array. The second part is after checking the bitmap/array and assigning the final values to the bitmap. The bitmap conversion code was pretty much exactly what I posted in my solution earlier.
Texture size 1024x1024
Bitmap conversion mid time: 50.069981
Bitmap conversion final time: 105.990868
Array mid time: 4.291205
Array final time: 2142.164307
Texture size 2048x2048
Bitmap conversion mid time: 200.142319
Bitmap conversion final time: 418.508026
Array mid time: 23.939423
Array final time: 8596.512695
Texture size 4096x4096
Bitmap conversion mid time: 790.517334
Bitmap conversion final time: 1655.713867
Array mid time: 111.133240
Array final time: 34683.953125
My conclusion is that it is accessing the values in the array that is taking the most time. Since the bitmap only needs to access one row of pixels at a time there are a lot less checks. For the array I tried float arr* and std::vector<float>, both were fairly similar in times. The time for calculating the conversion itself is also a very small part of these times and negligible for the situation.
This means I save both memory and a whole lot of time. Pretty cool.:cool: Goes to show you never know about these things.
What in the name of all that is good did you do to those poor arrays? My computer must be a little slower (or I'm a worse programmer), for my 1024x1024 array mid time was 15 ms, and my 4096x4096 array mid time was around 140 ms. But the total time to finish the 4096x4096 array was only 550 ms. As far as I knew, you were finding the maximum value (which I did as I was writing the array, 'cause why not?) and then dividing all the values in that array by the largest value. Was there something else? Just for completeness, this is what I tested with:
Code:#include <iostream>
#include <ctime>
int main() {
float *bigarray;
bigarray = (float *)malloc(4096L*4096*sizeof(float));
float biggest = -5.0f;
clock_t beginning, middle, end;
beginning = clock();
for (int i = 0; i < 4096; i++) {
for (int j = 0; j < 4096; j++) {
bigarray[i*4096+j] = i*1.0f*j;
if (biggest < bigarray[i*4096+j]) {
biggest = bigarray[i*4096+j];
}
}
}
middle = clock();
for (int i = 0; i < 4096; i++) {
for (int j = 0; j < 4096; j++) {
bigarray[i*4096+j] /= biggest;
}
}
end = clock();
std::cout << "To the middle: " << middle - beginning << std::endl;
std::cout << "To the end: " << end - middle << std::endl;
std::cout << "Oh and time unit: " << CLOCKS_PER_SEC << std::endl;
return 0;
}
Ouch man I'm sorry my test was not fair to the array since I was putting in the pixels to the bitmap one by one there instead of a whole line at a time. Updated results...
1024x1024
Bitmap conversion mid time: 50.069981
Bitmap conversion final time: 105.990868
Array mid time: 4.291205
Array final time: 56.237366
4096x4096
Bitmap conversion mid time: 790.517334
Bitmap conversion final time: 1655.713867
Array mid time: 104.611938
Array final time: 912.804382
So.. there wasn't a performance gain actually. The difference in time here is most certainly due to putting pixels into the bitmap up until mid time. These times (milliseconds) are so small though that I see no reason not to use the bitmap conversion since the gains in memory can be great.
Storing colors as floats is not about high dynamic range unless the hardware can use floating point textures. Normalizing colors is an ancient art that has been around for a long time. It's far easier to do color calculations on normalized rgba's than it is to do them on concrete values within a certain range.
In ALL of my recent shaders colors are ALWAYS sent to the card as floats. You gain nothing except extra memory consumption when you store bitmaps in memory as floats. High dynamic range comes from post-processing on a scene or image and has nothing to do with how the data is originally stored unless you have some very very specific hardware I've not heard of.
High dynamic range is accomplished by taking an original scene and applying a post-process luminance pass. Then the result is run through a gaussian filter on a floating point texture. The final result is then mixed back into the original scene and the result is rendered to the screen as a screen-sized quad or screen-aligned quad. Some techniques also do a tone map pass and several other passes for different type of post process effects. I suggest you do some googling on high dynamic range rendering and faked high dynamic range rendering. The Shader X series of books all cover this as well as a host of other shader-based books like GPU gems.
Textures when sent to the video card are already converted into floats but you gain NOTHING from this except ease of calculation. You cannot represent an infinite amount of values between 0 and 255 just because you use floats and you also do not gain ANY color depth unless your system supports floating point textures and/or render targets. Your video card would have to support a floating point primary buffer in order to actually gain color depth. This also would depend on your display's ability to reproduce these discrete values. Even IF your card does support a floating point primary buffer there is still a finite amount of color representations available.
I'm still completely lost as to what you gain here, why you need it and why it is any different from what has been done in games since DirectX 9 and floating point textures were introduces. Even in DirectX 8 at the shader level colors were represented as 4 floats. I see nothing revolutionary here.
If you could store an infinite amount of data in a floating point array then I would store my terrain height maps as floats. However storing them as unsigned char arrays yields plenty of information. All I do is then scale the data in the array to arrive at the final world height. The theory is the same whether it be color or heights you are representing. Array...IE textures have a finite resolution and a finite color depth. Nothing you do can or will change that. The only way I know of to get extremely smooth hills (IE: extremely smooth color variations in your case) is to filter the data in the array.
Also you are killing your bus here by forcing it to pass in 4 floats that are 32 bits each. That is 128 bits of data per texel going across the bus to the video hardware as opposed to 4 bytes or 32 bits. You are passing 4 times more data across the video bus than you need to and with no benefit.
With the main slowdown and bottleneck these days being the video bus I fail to see how this is a smart move. If you want high dynamic range rendering then do it on the card and not in software.
The reason he do what he does is so he won't have to allocate another buffer for his per-pixel-float-value (thickness). Instead he stores it in the bitmap memory (since those colors are of no further use after the thickness has been calculated).
Thickness and color? Ok. Whatever. There are well developed theories and implementations that already solve all of this.
Well, I don't know how much are you allowed to change. But if you could change the color class you could store directly the float value. I mean, don't use the normal functions to set a color
So, do you have access to the color class? Are you allowed to add things to it?
Bubba - Sounds to me like you are referring to high dynamic range applications for games which I'm sure involves a lot of post processing. 3d applications though use bitmaps in formats like .hdr which stores floating point data that is used to illuminate a scene when rendering in the 3d application. That is the environment I'm working in so that is why the color class is using float values.
C_ntua - I won't be able to change the color class myself. I could write my own class but that would just add a level of indirection.
Then maybe you can hope that the variables are protected and not private. In which case you can create a derived class from the base COLOR class. You can then create a function that sets what you want and use the derived class instead of the base class. If the variables are private and you cannot declare them as protected... I have no more ideas :)