I could use directions to a good, complete explanation of the different types of floating point variables, esp w regard to how they are stored in memory (how many bits) and how many significant digits I can expect from each type, why the claimed range of values is so much larger than the actual number of significant digits can possibly support, and what this has to do with how many bits they are stored in.
For example I've read that if I declare a variable like so float var_name, why, despite the fact it (1) only enables dealing with decimal numbers to about 7 significant digits, it (2) requires 2 words (64 bits) for storage, and (3) is said to have a "range" of 3.4E-38 to 3.4E+38
What is the relationship between the factors 1, 2, and 3.
I don't expect anyone to answer it here, rather, I was just thinking that someone might be able to point me to something on-line that provides a good explanation of the relationship of 1, 2, and 3.
Based on your description, you're talking about the x86 architecture. You will probably be able to find the answers to your questions in an Intel white paper that describes the floating point registers. If I remember correctly, Intel has copious amounts of white papers available at thier website (www.intel.com).
Sorry I haven't provided a more specific link, but I seem to be unable to reach the Intel website from my PC at work.
I should have added, when I wrote my message above (before I registered) that I wanted this info as a background to making practical choices as to what kind (what "type") of variable to use, when writing a program in C -- if I want to make sure that (1) I have enough significant digits when any variable is, eventually, printed out as digits (or used to specify the size or number of a value that controls any kind of output) and (2) processing of the compiled C program doesn't take a lot longer to arrive at because the processor was kept busy processing many more significant digits, over and over, than were needed in order to arrive at a sufficient degree of precision at output-time.
So I don't necessarily need to know all the details, at this time, of how the processor deals with the data -- but I am trying to make sure I do what I can to make sure my outputs don't have either (21) meaningless digits, and excessive time in compilation or execution, or (2) inadequate digits and imprecise "answers" (or drawings or whatever). And, the numbers, as I have learned them so far, don't give me any sense whatsoever of how to choose whether to use float var_name or double var_name when declaring variables.
float: 7 significant digits; it requires 64 bits in memory to identify every possible real number than has 7 signficant digits. Why? And why is this type of variable said to have a "range" of 3.4E-38 to 3.4E+38 ? (I understand "scientific" notation by the way.) The relationship between these three facts isn't "gelling" for me.
I am planning on getting back to learning assembly language soon, but for now I just wanted an answer relative to C.
> how many significant digits I can expect from each type
Look in float.h
> why the claimed range of values is so much larger than the actual number of significant digits
The key part is significant digits
You might think you're storing
in a float, and it will quite happily fit, but floats are approximations which only guarantee to store the 6 (in the case of float, 15 for double) most significant digits with any accuracy.
Anything after that is just noise, and it not to be taken seriously.
For instance, adding 1 to the above number will have no effect, because 1 isn't large enough to affect the smallest significant digit.
If you really want to store all 30 decimal digits with absolute accuracy, and be able to add 1 to it to get
Then none of the floating point types (float, double, long double) will do, and you need something like
For floating point computations in C, there's one rule of thumb: use 'double's.
This type usually has about 14 significant digits, and a dynamic range between 1e-300 and 1e300, which is usually more than enough. Moreover, at least on x86 systems, the basic functions on floating point values for floats, doubles and long doubles take the same amount of time (assuming they are in cache memory).
There are of course exceptions to the rule... If you need more precision you could use long doubles (10+2 bytes), but on x86 you gain only 3 digits! If you need large amounts of data (much more than the cache size), AND you access it sequentially, then you may be able to make your computation faster by using floats (4 bytes on x86). Finally, if you want to conserve memory you could opt for floats.
floats are approximations which only guarantee to store the 6 (in the case of float, 15 for double) most significant digits with any accuracy.
OK, I see what's going on now Salem. Just because the largest value that the variable can have is 3.4E+38 and the smallest value is 3.4E-38, doesn't mean that there are 2 *(3.4E+38) distinct values that can be stored in the variable. There are still only about 10,000,000 distinct values (the number of values that can be identified with 7 digits) that can be stored in the variable (one value at a time). And the reliability of the least significant digit, the 7th digit, may be "iffy," due to the fact that there is no digit less significant than it, no 8th digit, I guess, to be used to decide whether to round the 7th digit up or down.
For floating point computations in C, there's one rule of thumb: use 'double's.
This type usually has about 14 significant digits, and a dynamic range between 1e- 300 and 1e300, which is usually more than enough. Moreover, at least on x86 systems, the basic functions on floating point values for floats, doubles and long doubles take the same amount of time (assuming they are in cache memory).
Hmm, the Turbo C help file which said:
Variables of type double are four words in
length. Their range is 1.7E-308 to 1.7E+308.
Variables of type long double are five words
in length. Their range is 3.4E-4932 to
Used the term "range" instead of "dynamic range." But I think "dynamic range" sounds like a better description of what we have here.
I guess I don't really have to know why, exactly, that double type variables are stored in 128 bits (4 words), have a range as specified, and can store the number of distinct values that can be described by 14 or so digits (i'm guessing it is actually the number of binary or hexidecimal digits that decides the number of distinct values) -- as long as I know what the dynamic range and number of significant digits is, and keep the number of significant digits in mind when I decide upon the order, or precedence, of operations performed on them (since certain orders of operation result in less loss of signifcance than others).
just a little extra info
Just a little extra info on why the precision is not there.
Say you have a float that takes 32 bits,
1 bit is used to determine the sign (+ or -)
8 bits are used to determine the exponent (# x 2^exponent, where # is determined by the last 23bits)
The formula to convert a float to decimal is as follows:
Decimal Value = (-1^s) x 1.fraction x (2^(exp-127))
where s is the 1st bit, fraction is the last 23 bits, and exp are the 8 bits in between.
So, for example
s fraction exp
1 10000001 10101000000000000000000
Convert the fraction to decimal: 10000001 = 129
and then substitute
(-1^1) x 1.10101000000000000000000 x (2^(129-127))
-1.10101 x (2^2)
Converting to decimal yields -6.625
Hope this helps, sorry if it doesn't.
Hmmm. I'm gonna have to study that for a little while, unregistred-person.
Hmm, I remember seeing a quite similar explanation, to the explanation that unregistered-person posted above, before -- many years ago -- possibly when I started studying Atari 8-bit assembler, about 13 years ago. While I still haven't figured out exactly what unregistered-person said, above, I remember understanding what it was that I read many years ago. I don't remember my understanding of it; but I remember understanding it -- that isn't too difficult a distinction to grasp, is it?
verb: remember that turbo C is a 16-bit compiler, so in their help-files they use "word" for a 16-bit integer... A double is 64 bits in length and a long double 80 bits.
remember that turbo C is a 16-bit compiler, so in their help-files they use "word" for a 16-bit integer
Did I say 2 words were 64 bits back in my first post in this thread? I don't know how I came up with that number. I'm serious. I don't remember it ever occurring to me that 1 word would be anything but 16 bits, and I don't know how I multiplied 16 * 2 and came up with 64 instead of 32 -- though I remember making similar mistakes in the past.
Also I carried the error along to my subsequent posts, because I relied on what I said in my first post, without checking the arithmetic.
I always think nibble, byte, word ; 4, 8, 16.
I appreciate your pointing out my error, Alex, as I might have gone on with it -- for weeks or months -- before noticing it.
useless facts... :D
"word" is usually used for the native data-type of the architecture... The only exception to that rule that I can think of is intel assembly, where a word is always 16 bits, and dword is always 32 bits, independent of the state of the processor. Useless fact: some early IBM mainframes had a native datatype of 36 bits (and a byte containing 9 bits!); almost all architectures have a native datatype of "1<<n" bits, but the definition of the C programming language supports other sizes! (for example: x86 (80 bit) and Macintosh (96 bit) extended precision floating point types)
"word" is usually used for the native data-type of the architecture... The only exception to that rule that I can think of is intel assembly, where...
Moral to this story: the same word can mean different things in different places, or different contexts; note context when listening, and indicate context when speaking -- or attempts to communicate will degrade into orgies of screaming and hair-pulling.
I think that you shouldnīt use 16-bit compilers like Turbo C anymore.
Here is a free 32-bit development environment for C/C++.
Itīs the gcc-portation for DOS.
Downloaded it already, klausi. It is an large set of tools for dos, and window, 16 and 32-bit c and c++ programming. It takes a long time to download, and a long time to set up so that it works. Plus it's documentiation doesn't seem to be exhaustive doc of C and C++ itself, just doc of the how to use the tools included.
Since I am only learning the rudimentry fundamentals of C I don't need all that stuff yet. I am not a professional applications developer and it is unlikely that I ever learn enough about programming to become one -- unless people want to pay me for my little utitlities to figure out the equally tempered pitch on their piano and all the beat rates for all coincident partials for all the various pitch intervals, starting from any value of A4 (88:A49) (usually circa 440 hz) (though they could do this with a bunch of pages in Excel. ) or some other little utility that saves them from having to do repetitious key-ins on a calculator.