float13

**tabstop** · 12-17-2008

Originally Posted by esbo

Anyway your explaintion does not explain the results for the program, does it?

Of course it does. 0.1f is not 0.1. 0.1f is 0.000110011001100110011001101b in binary (which is actually represented internally in "binary scientific notation" as 1.10011001100110011001101 x 2^-4). Similarly 1.3f is not 1.3, it is 1.01001100110011001100110b. Add them together, and you get 1.01100110011001100110010b, which evaluates to 1.3999998569488525f.

**laserlight** · 12-17-2008

Originally Posted by esbo

Well they might have wrote the standard, but I will bet you a pound to a penny they didn't
write the compiler!!

I do not know if anyone from that standards committee happens to have written that compiler, but in any case I think that the C standard does not mandate IEEE754 for floating point implementations, in which case QuantumPete would merely be making an educated guess. Nonetheless floating point inaccuracy would existing in any floating point representation.

Originally Posted by esbo

Anyway your explaintion does not explain the results for the program, does it?

It might, but as Dino has hinted there may be other factors like flawed hardware.

Originally Posted by esbo

If so explain it in detail and how the error occurs whence it does.

QuantumPete has already provided such an explanation: due to floating point inaccuracy, the truncation could result in X being assigned 13, but when x is printed, it is printed as 1.4 even though the stored value is marginally less than that. This then repeats until i == 20 since X would be assigned 13 on the next iteration, so on and so forth.

EDIT:
Oh, I am wrong: C99 does mandate support for IEC 60559, which is another name for IEEE754, except that long double could be implemented with a non-IEC 60559 extended format.

EDIT #2:
Heh, maybe I was just a little too excited at discovering Annex F. Apparently what I mentioned earlier comes with the caveat that "an implementation that
defines _ _STDC_IEC_559_ _ shall conform to the specifications in this annex."

**matsp** · 12-17-2008

Originally Posted by esbo

Well they might have wrote the standard, but I will bet you a pound to a penny they didn't
write the compiler!!
Anyway your explaintion does not explain the results for the program, does it?
If so explain it in detail and how the error occurs whence it does.

And how would the compiler CHANGE what happens in the floating point processor [aside from marginally in the sense that the compiler obviously will decide the order and type of instructions issued - but I'm fairly sure that is not the problem here].

The problem here is not that floating point instructions are issued in the wrong way, but rather that numbers like 0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9 can not be described PRECISELY in floating point registers. Just like in decimal numbers 1/3 can never be described precisely - you can get close with (say) 20, 100, 1 million or a gazillion three's following the initial 0.3, but it's still an approximation. Likewise, 0.1 in the standard floating point can only become 0.999999999 - the more bits the floating point format supprts, the more nines we get, but there's it will never become 0.1.

When converting between floating point and integer, the compiler is obliged to chop of the result with no rounding whatsoever - so given the right circumstances, we can drop a decimal value - it is clearly shown in the example above.

Of course, depending on:
1. The processor (floating point unit) design
2. Compiler settings (e.g. optimization options - but also other settings may change the code-generation)
3. Compiler vendor.
4. The exact formulation of the code.
the result may vary - an x86 compiler may, for example, keep values in floating point registers rather than storing and loading temporary values from memory for every operation, and since the x86 floating point registers are 80-bit precision intermediate calculations [1] are done with high precision and then rounded at the time they are stored out to memory. This particularly affect "small values added and then subtracted from a much larger value", as the intermediate result will be PRECISE, whilst if it's stored out, it gets much less precise.

[1] Subject to settings in the "precision" register in the FPU status register - we can set the floating point processor to round to 32 or 64-bit precision after each operation.

--
Mats

**esbo** · 12-17-2008

Originally Posted by QuantumPete

Actually, it would be the committee that came up with the IEEE floating point standard. Floating point numbers are approximations, because decimal values cannot be accurately represented using binary. Thus 0.3 is actually .299999999999. If you then multiply that by 10 say (2.9999999999) and cast to an int, you get 2 (because int cast truncates), not 3, as you'd expect. This is a common occurrence in programs that mix floats and ints, which is why it's generally a good idea to avoid using floats or doubles whenever possible.
There was an in-depth discussion of this a couple of months back, if you want to scour the archive.

QuantumPete

OK can anyone explain why the error occurs *when* it does?
In detail, not a general 'ramble'?

**tabstop** · 12-17-2008

Originally Posted by esbo

OK can anyone explain why the error occurs *when* it does?
In detail, not a general 'ramble'?

Because that's the first time that the rounding rules (round to nearest) cause a round down. I did the 1.3 to 1.4 case up above; I don't feel like typing 32 bits for each of the numbers in between.

**C_ntua** · 12-17-2008

I think matsp explains it well.
So assigning a float to an int is undefined, right? And you can do that without a cast. Isn't that bad, as a language design? Shouldn't a cast be necessary?

**matsp** · 12-17-2008

Originally Posted by esbo

OK can anyone explain why the error occurs *when* it does?
In detail, not a general 'ramble'?

Aside from tabstops two replies, I suppose all of the above is rambling, because it tries to explain in words how it works, rather than type out the binary values. Well done tabstop for making that effort.

However, it is critical to understand that these errors can happen at any time, given the right values - converting floating point to integer will chop of any decimals - so the integer value may be one less than what you'd expect it to be, unless you specifically add on a small amount to ensure that you are rounding it the right way [how small a value depends on what you want to achieve).

--
Mats

**laserlight** · 12-17-2008

Originally Posted by C_ntua

So assigning a float to an int is undefined, right?

It is defined: "When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero)."

There is an undefined behaviour caveat though: "If the value of the integral part cannot be represented by the integer type, the behavior is undefined."

**MK27** · 12-17-2008

Originally Posted by C_ntua

I think matsp explains it well.
So assigning a float to an int is undefined, right? And you can do that without a cast. Isn't that bad, as a language design? Shouldn't a cast be necessary?

The way I understand it, a cast wouldn't make any difference anyway, and there can be no way out of the problem unless you have a computer that uses millions of tiny elves with 0-9 on their jerseys instead of electricity.

Relating this to the "undefined" nature of pi, you know there are people out there who want it set to 3.14 by law

**matsp** · 12-17-2008

Originally Posted by C_ntua

I think matsp explains it well.
So assigning a float to an int is undefined, right? And you can do that without a cast. Isn't that bad, as a language design? Shouldn't a cast be necessary?

It is not UNDEFINED - it is perfectly defined (as long as the integer result fits in the type it is converted to - values bigger than 2^31 do not make good signed integers on a 32-bit processor, for example, and is undefined): the value is truncated - that means that all decimals are "removed". However, what makes it difficult is the fact that floating point values aren't precise - 1.4 is in fact 1.39999986 (approximately), and when we multiply by ten and truncate it, we get 13, not 14. Divide that by 10 and add 0.1, and we're back at 1.3999996, and on we go.

And it's correct that integer and float can be convered back and forth without casts (although some compilers may warn for such implicit conversions if there is a risk of loss of data - e.g. float -> integer may drop decimal places, and thus loose the original meaning of the value, whilst integer -> float should be acceptable at most times [although the float may not be as precise as the integer value]).

Edit: By rambling too much, I got beaten by Laserlights exact quotes from the standard.

--
Mats

**laserlight** · 12-17-2008

Originally Posted by MK27

there can be no way out of the problem unless you have a computer that uses millions of tiny elves with 0-9 on their jerseys instead of electricity.

Hence the way out of the problem is to use a decimal representation with sufficient precision.

**C_ntua** · 12-17-2008

Yeah, but do you know that 14 in floating representation is less or equal to 14? Or can it be also greater? Meaning being 14.000200031 for example. Because in that case the decimal part would be dropped, but how would you know what is the integral and what is the decimal value? It would be defined, but you would need to do the math, which might not be possible.
Or am I missing something?

Lol, tiny elves. You could use an "analog" computer instead of a normal digital one and still use electricity. Two expensive? Sell the elves. Too slow? Have the elves make you an efficient ones. Your elf solution in inefficient and against their rights

**esbo** · 12-17-2008

Originally Posted by tabstop

Because that's the first time that the rounding rules (round to nearest) cause a round down. I did the 1.3 to 1.4 case up above; I don't feel like typing 32 bits for each of the numbers in between.

It's not the first time though is it it is the second time as far as I can see, the first
round down occurs on line 7.
Hence the explainations so far seem to fall into the 'general ramble' category?

Code:




 1 0.1000000015  0   2 0.2000000030 
 2 0.2000000030  1   3 0.3000000119 
 3 0.3000000119  2   4 0.4000000060 
 4 0.4000000060  3   5 0.5000000000 
 5 0.5000000000  4   6 0.6000000238 
 6 0.6000000238  5   7 0.7000000477 
 7 0.6999999881  6   8 0.8000000119 <-----------------------------:O)
 8 0.8000000119  7   9 0.9000000358 
 9 0.8999999762  8  10 1.0000000000 
10 1.0000000000  9  11 1.1000000238 
11 1.1000000238 10  12 1.2000000477 
12 1.2000000477 11  13 1.3000000715 
13 1.2999999523 12  13 1.3999999762 
13 1.2999999523 13  13 1.3999999762 
13 1.2999999523 14  13 1.3999999762 
13 1.2999999523 15  13 1.3999999762 
13 1.2999999523 16  13 1.3999999762 
13 1.2999999523 17  13 1.3999999762 
13 1.2999999523 18  13 1.3999999762 
13 1.2999999523 19  13 1.3999999762

**tabstop** · 12-17-2008

Originally Posted by esbo

It's not the first time though is it it is the second time as far as I can see, the first
round down occurs on line 7.

Code:




 1 0.1000000015  0   2 0.2000000030 
 2 0.2000000030  1   3 0.3000000119 
 3 0.3000000119  2   4 0.4000000060 
 4 0.4000000060  3   5 0.5000000000 
 5 0.5000000000  4   6 0.6000000238 
 6 0.6000000238  5   7 0.7000000477 
 7 0.6999999881  6   8 0.8000000119 
 8 0.8000000119  7   9 0.9000000358 
 9 0.8999999762  8  10 1.0000000000 
10 1.0000000000  9  11 1.1000000238 
11 1.1000000238 10  12 1.2000000477 
12 1.2000000477 11  13 1.3000000715 
13 1.2999999523 12  13 1.3999999762 
13 1.2999999523 13  13 1.3999999762 
13 1.2999999523 14  13 1.3999999762 
13 1.2999999523 15  13 1.3999999762 
13 1.2999999523 16  13 1.3999999762 
13 1.2999999523 17  13 1.3999999762 
13 1.2999999523 18  13 1.3999999762 
13 1.2999999523 19  13 1.3999999762

So as you can see, 0.1f is slightly bigger than 0.1 (the last 15 there). So even though 0.7f is represented by a number less than 0.7 (0.6999999881), 0.6f + 0.1f is still bigger than 0.7 (0.7000000477). So all we've proved is that 0.6f + 0.1f does not actually equal 0.7f.

**tabstop** · 12-17-2008

Originally Posted by C_ntua

Yeah, but do you know that 14 in floating representation is less or equal to 14? Or can it be also greater? Meaning being 14.000200031 for example. Because in that case the decimal part would be dropped, but how would you know what is the integral and what is the decimal value? It would be defined, but you would need to do the math, which might not be possible.
Or am I missing something?

Lol, tiny elves. You could use an "analog" computer instead of a normal digital one and still use electricity. Two expensive? Sell the elves. Too slow? Have the elves make you an efficient ones. Your elf solution in inefficient and against their rights

Calm down, breathe, and ask again, because I have no idea what you're asking at the top. (Note that all the integers up to 2^22 are exact floats.)

Thread: float13

Thread Tools

Search Thread

Display