Representing floats with color?

Printable View

Show 80 post(s) from this thread on one page

12-21-2008
Magos

Any decent compiler would transform a division-by-constant to a multiplication-by-inverse-constant anyway.
12-21-2008
R.Stiltskin

Quote:

Originally Posted by DrSnuggles

I don't want to store an array in memory with possibly several million floats. Working with such an array would also be slow. With a bitmap I have the benefit of being able to easially check neighbors of the values as well.

It's still not clear why you want to do so much arithmetic on several million pixels.

Why would working with floats be slower than working with quadruples of 4 chars? And why would checking neighbors be easier with 4 chars. If I understand your solution, you will have to look at all 4 chars for each comparison, since (for example) [255, 0, 0, 0] will be closer to [0,255,255,255] than [0,0,0,0] is to [0,0,0,10]. Correct?

If you are starting out with float values to begin with, and are primarily trying to save space, why is EVOEx's solution not preferable? The bitmap can be an array of unions instead of an array of 4-tuples of chars. Initially you would store the float values there. Do whatever processing you need to do on them, and do only a single conversion at the end, storing the float in each pixel with the 4 chars to display the image?

What's wrong with that?
12-21-2008
m37h0d

Quote:

Originally Posted by DrSnuggles

Hey, I appreciate all the input. I have found a solution to this that should be fast enough to use, so it is a win situation. I save memory and can work with a single structure (the bitmap) to do all I need. It is similar to some of the latest posts. I split an integer into its bytes. Here is the code:

Converting to color:

Code:

float thickness = 678460.545f; int val = (int) (thickness * 1000.0f); //Gives me three decimals which is enough float div = 1.0f / 255.0f; float r = (float) ((int) (val & 0x000000FF) >> 0) * div; float g = (float) ((int) (val & 0x0000FF00) >> 8) * div; float b = (float) ((int) (val & 0x00FF0000) >> 16) * div; float a = (float) ((int) (val & 0xFF000000) >> 24) * div;

Converting back to float:

Code:

byte b1 = (byte) ((int)(r * 255.0f)); byte b2 = (byte) ((int)(g * 255.0f)); byte b3 = (byte) ((int)(b * 255.0f)); byte b4 = (byte) ((int)(a * 255.0f)); int vv = 0; vv += (int) ((b1 & 0x000000FF) << 0); vv += (int) ((b2 & 0x000000FF) << 8); vv += (int) ((b3 & 0x000000FF) << 16); vv += (int) ((b4 & 0x000000FF) << 24); float thickness2 = vv * 0.001f;

If you see any optimization that can be done let me know. As an example removing the "& 0x000000FF" in the converting back makes it still work. Not sure if that is safe though...?

Also the talk about thickness is not really relevant here, that is a separate calculation I do on a 3d object that is already covered. Only in the final step will the bitmap display something worth looking at.:)

Will check out your solution as well iMalc, thanks.

the one i posted here on page 3 is probably faster

http://cboard.cprogramming.com/showp...6&postcount=39
12-21-2008
m37h0d

this still seems like madness though. why can't you just cast it??
12-21-2008
iMalc

Quote:

Originally Posted by Magos

Any decent compiler would transform a division-by-constant to a multiplication-by-inverse-constant anyway.

Indeed, and in fact I was relying on this for the code I provided. Interestingly enough though, whether VS2005 and up do this sometimes depends on the floating point consistency model selected for the project. I mean for powers of two it'll surely convert to a multiplication, but for something like dividing by 3 it probably wont use a multiplication by 1/3rd for the precise model, because that can't be represented exactly, whereas 3 can, and the multiplication result might differ by a couple of least-significant bits in the significand.

Quote:

Originally Posted by DrSnuggles

Thanks, yes all the casting may not be necessary.

I don't agree with the division optimization though. Division is a lot slower than multiplying so since division will happen often using the same value it is better to calculate it as 1.0f / 255.0f once(outside the loop) and resuse that for multiplication.

Yes division is certainly a lot slower than even 3 multiplications. Any reduction in the number of divisions performed has got to be a win. There's no harm in explicitly doing the optimisation of reducing the divides by hand here.

Note that with the method I posted, you don't need any premultiplication step. It should be able to operate on whatever range of values your float initially contains, and maintains accuracy for small and large values. By all means use whatever turns out to be fastest though

Minor optimisation, always do the bitmasks after a right shift:

Code:

float thickness = 678460.545f; unsigned int val = (unsigned int) (thickness * 1000.0f); //Gives me three decimals which is enough float div = 1.0f / 255.0f; float r = (val & 0xFF) * div; float g = ((val >> 8) & 0xFF) * div; float b = ((val >> 16) & 0xFF) * div; float a = ((val >> 24) * div;

It takes less bytes of machine code to represent smaller integer constants. And leaving out the last bitmask is safe so long as val is positive, hence I've made it unsigned to be sure. I'd also leave out the zero-shift even though it doesn't look as pretty and the compiler would have generated the same code anyway, but that's just me.
12-21-2008
m37h0d

couldn't you also bitshift by 8 instead of dividing by 256?
12-21-2008
Magos

Quote:

Originally Posted by m37h0d

couldn't you also bitshift by 8 instead of dividing by 256?

And you think the compiler makers didn't think of this, why?
12-21-2008
m37h0d

because they've made it floating point multiplication? i guess the real answer is that they can't do that because that would shift all the data off the end of the end, and so they'd get nothing.

this whole thing seems odd to me tho. seems like a lot of trouble to somewhat optimize an inherently inefficient system.
12-21-2008
VirtualAce

Quote:

byte b1 = (byte) ((int)(r * 255.0f));
byte b2 = (byte) ((int)(g * 255.0f));
byte b3 = (byte) ((int)(b * 255.0f));
byte b4 = (byte) ((int)(a * 255.0f));
int vv = 0;
vv += (int) ((b1 & 0x000000FF) << 0);
vv += (int) ((b2 & 0x000000FF) << 8);
vv += (int) ((b3 & 0x000000FF) << 16);
vv += (int) ((b4 & 0x000000FF) << 24);
float thickness2 = vv * 0.001f;

You have effectively removed over 2 million colors by selecting int for vv. Colors are unsigned int which gives you the full range of over 4 million colors in 32-bit color. Negative r,g,b values do not make sense and will most likely result in some color inversion. Your version will overflow the data type you have selected.
12-22-2008
CodeMonkey

Wow. Long thread.

As it seems to have been repeated quite a lot, it's memory overhead of using two arrays vs. processing overhead of casting (even if it's some custom cast). Well? If you've chosen to let the CPU take the heat, then there are many solutions here (i.e. last six pages). However, I suggest you use two arrays. What's a factor of two among programmers?
12-22-2008
Elysia

Another array was suggested, but shot down by OP, because it was too... memory consuming, I think.
Nevertheless, I think both ways should be tested and the one that performs the best picked.
12-22-2008
DrSnuggles

I understand you are confused...lost...he is doing something unconventional.. what is he doing... what is he building.. we have the right to KNOW!

A bitmap should be used as a bitmap.. using a bitmap for calculating values? Blasphemy! It was never meant for that! It is bad design, polluting the conventions. What if everybody started doing as they pleased? What then? Yes it would be anarchy and we can't have that! You use two arrays now you hear... like everybody else. Don't go thinking you are special and optimize things.

Meanwhile in a small house in Sweden there sits a programmer, smiling in triumph as he looks down on his unusually optimized code.:)
12-22-2008
matsp

Quote:

Originally Posted by DrSnuggles

I understand you are confused...lost...he is doing something unconventional.. what is he doing... what is he building.. we have the right to KNOW!

A bitmap should be used as a bitmap.. using a bitmap for calculating values? Blasphemy! It was never meant for that! It is bad design, polluting the conventions. What if everybody started doing as they pleased? What then? Yes it would be anarchy and we can't have that! You use two arrays now you hear... like everybody else. Don't go thinking you are special and optimize things.

Meanwhile in a small house in Sweden there sits a programmer, smiling in triumph as he looks down on his unusually optimized code.:)

You may find, however, that the memory saving is not worth the loss of performance in other places. Unless your image/map is hundreds of megabytes (or you have a system that has only a few megabytes of memory), you are most likely better off with two arrays. Saving memory is fine, but it comes at some sort of price. Particularly, modern processors "tag" the data that it's got for use as either floating point or integer data, and when you switch from one to the other, it often takes quite a few extra cycles to process the data.

--
Mats
12-22-2008
DrSnuggles

Like I said I will try both ways, if it turns out that the conversion only takes one or two seconds extra overall I think it will be worth the saved memory.

This tool I'm working on may be used in an environment where a lot of bitmaps and memory are already used.

I said there may be 4096x4096 pixel bitmaps but it may very well be more than that. As an example Gollum used 20000x20000 pixel texures, that would be something like 1.6 GB of memory. Would be to bad if the guy working on something like that can't use my tool because of lack of memory.;)
12-22-2008
tabstop

I've read this thread three times, and I still am not completely sure I know what you've done here. You've taken a float, and stored in four "short floats" that only go to 255, and then you bring it back to a normal float later?

If that's so, then the "normal" way would require 4096x4096 comparisons, and 4096x4096 divisions. Your way would seem to require 4096x4096 comparisons, 4096x4096x5 divisions, and 4096x4096x4 multiplications (the extra / and * to take the float apart/put it back together). On the face of it that would seem to take 6-7 times as long, unless you've got that disassembling/reassembling down pat.

I don't know how much that would be offset by having to transfer from one array to another at the end, but I would be surprised if you come out ahead.

But I like surprises.
12-22-2008
m37h0d

apparently, the float channels go from 0-1. at least that's how it's been implemented.
12-22-2008
m37h0d

this is 20% faster on my machine. btw, to be fair, i also modified your code to reassign thickness rather than creating thickness2

Code:

float thickness = 678460.545f; int val = *(int*) &thickness; //truncating to 3 digits of precision is unnecessary float div = 1.0f / 255.0f; float r = (float) ((char) (val)) * div; float g = (float) ((char) (val >> 8)) * div; float b = (float) ((char) (val >> 16)) * div; float a = (float) ((char)(val >> 24)) * div; thickness = 0; int result = (((int)(255*a)<<24)&0xFF000000) | (((int)(255*b)<<16)&0xFF0000) | (((int)(255*g)<<8)&0xFF00) | ((int)(255*r)&0xFF); thickness = *(float *)&result;
12-22-2008
R.Stiltskin

DrSnuggles, in the bits of code you posted, the r,g,b and a members of Color are public. Is that true in the actual class, or was that for illustration only?
12-22-2008
C_ntua

Quote:

Originally Posted by matsp

You may find, however, that the memory saving is not worth the loss of performance in other places. Unless your image/map is hundreds of megabytes (or you have a system that has only a few megabytes of memory), you are most likely better off with two arrays. Saving memory is fine, but it comes at some sort of price. Particularly, modern processors "tag" the data that it's got for use as either floating point or integer data, and when you switch from one to the other, it often takes quite a few extra cycles to process the data.

--
Mats

Can you elaborate on the "tag" process please?
12-22-2008
matsp

Quote:

Originally Posted by C_ntua

Can you elaborate on the "tag" process please?

Basically, when the processor loads data into memory, it will know if the data is intended for floating point, and preprocess it into floating point format. If it's "wrong", the gain from the preprocessing is lost, and there will be an extra step before the data can be used as integer data. Although perhaps not at the caching phase, but rather when using SSE instructions (for example) to switch between float and integer.

--
Mats
12-22-2008
C_ntua

Hmm, I think I get what you are saying. Not completely, but it shows that float to integer isn't a good idea, except if you actually test it and have better results.

Googling I also found this article http://www.mega-nerd.com/FPcast/
that more or less has a faster float to integer conversion methods. So this should be a really helpful optimization for your code, since you will doing those a lot.

But, like tabstop, I haven't yet figured out what exactly the OP wants to do.
The idea of the optimization makes sense and should be tried, even if it fails. But these I don't get:
1) Why divide with 255 and then multiply with 255.
2) Why not store the float in r or g or b or a, since they are already float.
3) Why can't a pseudo code with the code be given. The loop where this conversions are happening.

Another note is that you are talking about huge numbers here, 1.6Gb!
Wouldn't it better to change the color class? Like using unsigned char instead of float for r, g, b, a? You will have then 400Mb. And since the bitmap is that large you might want to think how you iterate through it.
If you go from beginning to end then it won't be in the cache all the time. So since there will be swaps between main memory and cache, two bitmaps should do the same things faster. I have tried something similar and that is what I got

What I really don't get is this. You use floats for the colors and int for the thickness. You convert a float to int by multiplying. Then you convert that int to float again! If colors where int then you wouldn't have to cast int to float much less.

So, even if you this strange for me method, you should change something.
12-22-2008
VirtualAce

Not to mention that 24/32 bit color is NOT int. It is unsigned int. Go ahead and use int and watch what happens to the colors.
12-22-2008
R.Stiltskin

Quote:

Originally Posted by C_ntua

2) Why not store the float in r or g or b or a, since they are already float.

This is what I was getting at. Apparently the OP's using a Bitmap class that has a vector or array of Color objects, and he doesn't want to change the Color class. But in each of his examples, he accessed the r g b and a as public members. If they are public, he can bypass the Color class's mutator method & use a pointer to assign the "thickness" float directly to r (for example), and avoid all the encoding and decoding (and loss of precision).

He doesn't seem to want to answer questions. Oh well, he seems happy with his "optimization".
12-22-2008
DrSnuggles

Quote:

Originally Posted by R.Stiltskin

This is what I was getting at. Apparently the OP's using a Bitmap class that has a vector or array of Color objects, and he doesn't want to change the Color class. But in each of his examples, he accessed the r g b and a as public members. If they are public, he can bypass the Color class's mutator method & use a pointer to assign the "thickness" float directly to r (for example), and avoid all the encoding and decoding (and loss of precision).

He doesn't seem to want to answer questions. Oh well, he seems happy with his "optimization".

The color class is not written by me but it is something I have to work with. In my solution I used r,g,b,a freestanding as examples only. In the class those values go from 0.0 to 1.0 and only in steps between 0 and 255 so I can't store any float number there, only steps between 0 and 255. So assigning a value like 0.5f will result in a stored value of 0.4980392 which is (1.0 / 255.0) * 127.

The color class uses float to be able to store high dynamic range color data if needed. How the colors are stored in memory for the bitmap I'm not sure, in the case of not using high dynamic range I believe it is only the access function for the pixels that is returning floats and the memory is stored as bytes. That might not take up 1,6 GB but the important thing is that an additional array of floats in that case would.
12-22-2008
iMalc

Surely you must have something working by now?
What code have you used for the conversions, and how long does it take?
It is pointless to go through any more pages of discussion about theory. We need to see how it works in practice to be of any furtur help.
12-23-2008
DrSnuggles

Well I have some interesting results.:)

I timed both ways in milliseconds. "Mid time" here stands for the first part of assigning values to the bitmap/array. The second part is after checking the bitmap/array and assigning the final values to the bitmap. The bitmap conversion code was pretty much exactly what I posted in my solution earlier.

Texture size 1024x1024

Bitmap conversion mid time: 50.069981
Bitmap conversion final time: 105.990868

Array mid time: 4.291205
Array final time: 2142.164307

Texture size 2048x2048

Bitmap conversion mid time: 200.142319
Bitmap conversion final time: 418.508026

Array mid time: 23.939423
Array final time: 8596.512695

Texture size 4096x4096

Bitmap conversion mid time: 790.517334
Bitmap conversion final time: 1655.713867

Array mid time: 111.133240
Array final time: 34683.953125

My conclusion is that it is accessing the values in the array that is taking the most time. Since the bitmap only needs to access one row of pixels at a time there are a lot less checks. For the array I tried float arr* and std::vector<float>, both were fairly similar in times. The time for calculating the conversion itself is also a very small part of these times and negligible for the situation.

This means I save both memory and a whole lot of time. Pretty cool.:cool: Goes to show you never know about these things.
12-23-2008
tabstop

Quote:

Originally Posted by DrSnuggles

Well I have some interesting results.:)

I timed both ways in milliseconds. "Mid time" here stands for the first part of assigning values to the bitmap/array. The second part is after checking the bitmap/array and assigning the final values to the bitmap. The bitmap conversion code was pretty much exactly what I posted in my solution earlier.

Texture size 1024x1024

Bitmap conversion mid time: 50.069981
Bitmap conversion final time: 105.990868

Array mid time: 4.291205
Array final time: 2142.164307

Texture size 2048x2048

Bitmap conversion mid time: 200.142319
Bitmap conversion final time: 418.508026

Array mid time: 23.939423
Array final time: 8596.512695

Texture size 4096x4096

Bitmap conversion mid time: 790.517334
Bitmap conversion final time: 1655.713867

Array mid time: 111.133240
Array final time: 34683.953125

My conclusion is that it is accessing the values in the array that is taking the most time. Since the bitmap only needs to access one row of pixels at a time there are a lot less checks. For the array I tried float arr* and std::vector<float>, both were fairly similar in times. The time for calculating the conversion itself is also a very small part of these times and negligible for the situation.

This means I save both memory and a whole lot of time. Pretty cool.:cool: Goes to show you never know about these things.

What in the name of all that is good did you do to those poor arrays? My computer must be a little slower (or I'm a worse programmer), for my 1024x1024 array mid time was 15 ms, and my 4096x4096 array mid time was around 140 ms. But the total time to finish the 4096x4096 array was only 550 ms. As far as I knew, you were finding the maximum value (which I did as I was writing the array, 'cause why not?) and then dividing all the values in that array by the largest value. Was there something else? Just for completeness, this is what I tested with:

Code:

#include <iostream> #include <ctime> int main() { float *bigarray; bigarray = (float *)malloc(4096L*4096*sizeof(float)); float biggest = -5.0f; clock_t beginning, middle, end; beginning = clock(); for (int i = 0; i < 4096; i++) { for (int j = 0; j < 4096; j++) { bigarray[i*4096+j] = i*1.0f*j; if (biggest < bigarray[i*4096+j]) { biggest = bigarray[i*4096+j]; } } } middle = clock(); for (int i = 0; i < 4096; i++) { for (int j = 0; j < 4096; j++) { bigarray[i*4096+j] /= biggest; } } end = clock(); std::cout << "To the middle: " << middle - beginning << std::endl; std::cout << "To the end: " << end - middle << std::endl; std::cout << "Oh and time unit: " << CLOCKS_PER_SEC << std::endl; return 0; }
12-23-2008
DrSnuggles

Ouch man I'm sorry my test was not fair to the array since I was putting in the pixels to the bitmap one by one there instead of a whole line at a time. Updated results...

1024x1024
Bitmap conversion mid time: 50.069981
Bitmap conversion final time: 105.990868

Array mid time: 4.291205
Array final time: 56.237366

4096x4096
Bitmap conversion mid time: 790.517334
Bitmap conversion final time: 1655.713867

Array mid time: 104.611938
Array final time: 912.804382

So.. there wasn't a performance gain actually. The difference in time here is most certainly due to putting pixels into the bitmap up until mid time. These times (milliseconds) are so small though that I see no reason not to use the bitmap conversion since the gains in memory can be great.
12-23-2008
VirtualAce

Storing colors as floats is not about high dynamic range unless the hardware can use floating point textures. Normalizing colors is an ancient art that has been around for a long time. It's far easier to do color calculations on normalized rgba's than it is to do them on concrete values within a certain range.

In ALL of my recent shaders colors are ALWAYS sent to the card as floats. You gain nothing except extra memory consumption when you store bitmaps in memory as floats. High dynamic range comes from post-processing on a scene or image and has nothing to do with how the data is originally stored unless you have some very very specific hardware I've not heard of.

High dynamic range is accomplished by taking an original scene and applying a post-process luminance pass. Then the result is run through a gaussian filter on a floating point texture. The final result is then mixed back into the original scene and the result is rendered to the screen as a screen-sized quad or screen-aligned quad. Some techniques also do a tone map pass and several other passes for different type of post process effects. I suggest you do some googling on high dynamic range rendering and faked high dynamic range rendering. The Shader X series of books all cover this as well as a host of other shader-based books like GPU gems.

Textures when sent to the video card are already converted into floats but you gain NOTHING from this except ease of calculation. You cannot represent an infinite amount of values between 0 and 255 just because you use floats and you also do not gain ANY color depth unless your system supports floating point textures and/or render targets. Your video card would have to support a floating point primary buffer in order to actually gain color depth. This also would depend on your display's ability to reproduce these discrete values. Even IF your card does support a floating point primary buffer there is still a finite amount of color representations available.

I'm still completely lost as to what you gain here, why you need it and why it is any different from what has been done in games since DirectX 9 and floating point textures were introduces. Even in DirectX 8 at the shader level colors were represented as 4 floats. I see nothing revolutionary here.

If you could store an infinite amount of data in a floating point array then I would store my terrain height maps as floats. However storing them as unsigned char arrays yields plenty of information. All I do is then scale the data in the array to arrive at the final world height. The theory is the same whether it be color or heights you are representing. Array...IE textures have a finite resolution and a finite color depth. Nothing you do can or will change that. The only way I know of to get extremely smooth hills (IE: extremely smooth color variations in your case) is to filter the data in the array.

Also you are killing your bus here by forcing it to pass in 4 floats that are 32 bits each. That is 128 bits of data per texel going across the bus to the video hardware as opposed to 4 bytes or 32 bits. You are passing 4 times more data across the video bus than you need to and with no benefit.
With the main slowdown and bottleneck these days being the video bus I fail to see how this is a smart move. If you want high dynamic range rendering then do it on the card and not in software.
12-23-2008
Magos

The reason he do what he does is so he won't have to allocate another buffer for his per-pixel-float-value (thickness). Instead he stores it in the bitmap memory (since those colors are of no further use after the thickness has been calculated).
12-23-2008
VirtualAce

Thickness and color? Ok. Whatever. There are well developed theories and implementations that already solve all of this.
12-23-2008
C_ntua

Quote:

Originally Posted by DrSnuggles

The color class is not written by me but it is something I have to work with. In my solution I used r,g,b,a freestanding as examples only. In the class those values go from 0.0 to 1.0 and only in steps between 0 and 255 so I can't store any float number there, only steps between 0 and 255. So assigning a value like 0.5f will result in a stored value of 0.4980392 which is (1.0 / 255.0) * 127.

The color class uses float to be able to store high dynamic range color data if needed. How the colors are stored in memory for the bitmap I'm not sure, in the case of not using high dynamic range I believe it is only the access function for the pixels that is returning floats and the memory is stored as bytes. That might not take up 1,6 GB but the important thing is that an additional array of floats in that case would.

Well, I don't know how much are you allowed to change. But if you could change the color class you could store directly the float value. I mean, don't use the normal functions to set a color
So, do you have access to the color class? Are you allowed to add things to it?
12-30-2008
DrSnuggles

Bubba - Sounds to me like you are referring to high dynamic range applications for games which I'm sure involves a lot of post processing. 3d applications though use bitmaps in formats like .hdr which stores floating point data that is used to illuminate a scene when rendering in the 3d application. That is the environment I'm working in so that is why the color class is using float values.

C_ntua - I won't be able to change the color class myself. I could write my own class but that would just add a level of indirection.
12-30-2008
C_ntua

Then maybe you can hope that the variables are protected and not private. In which case you can create a derived class from the base COLOR class. You can then create a function that sets what you want and use the derived class instead of the base class. If the variables are private and you cannot declare them as protected... I have no more ideas :)

Show 80 post(s) from this thread on one page