Storing a float in 16 bits

**kara3434** · 07-31-2007

I need to store 10 million+ floats in memory for a program I'm writing, so I wanted to be able to store them in 2 bytes instead of 4. I was wondering if there is a good way to truncate a float down to 2 bytes and if needed go up from the 2 byte representation to a float.

Thanks!

**Salem** · 07-31-2007

It boils down to range and precision.
Can you maintain the range and precision you want in 16 bits?

**kara3434** · 07-31-2007

Originally Posted by Salem

It boils down to range and precision.
Can you maintain the range and precision you want in 16 bits?

Yeah, I'm willing to sacrifice precision, and in terms of range, the values I want to store only vary from -100 to +100, so range is not a big issue

**matsp** · 07-31-2007

There are some standard 16-bit floating point values, but they are pretty limited in useability - graphics processors use them somtimes to store "floating point pixels".

What range are your numbers? Would it be suitable to store them in a 12-bit signed integer and 4-bit fraction part?

The following does that. Note that you loose quite a bit of precision this way, but you can't have both compact format and a lot of precision.

Code:

#include <stdio.h>
#include <math.h>

short ftofix16(float num) {

  short i, f;
  
  if (fabs(num) > 2047.999f) {
    printf("Error: number out of range (num=%f)\n", num);
  }

  i = (short)num;
  f = (short)(fabs(num * 16)) & 15;
  return (i << 4) | f;
}

float fix16tof(int n)
{
  float s = 1.0f;
  if (n < 0) {
    s = -1.0f;
    n = -n;
  }
  return s * ((float)(n >> 4) + ((n & 15) / 16.0f));
}

int main(int argc, char **argv) {
  float f, g, h; 
  short a, b, c;
  for(;;) {
    scanf("%f %f %f", &f, &g, &h);
    a = ftofix16(f);
    b = ftofix16(g);
    c = ftofix16(h);
    printf("%04x, %04x, %04x\n", a, b, c);
    printf("%f, %f, %f\n", fix16tof(a), fix16tof(b), fix16tof(c));
  }
  return 0;
}

**matsp** · 07-31-2007

I wrote up the code shown above whilst replying, so I didn't know the range. With such a small number of significant digits, you could go for "8.8". The code would need to change from shifting by 4 to shifting by 8 and & 15 to & 255. The multiplication by 16 should be multiplication by 256. Otherwise same idea.

--
Mats

**kara3434** · 07-31-2007

Thanks mats! That was really helpful.

**brewbuck** · 07-31-2007

Originally Posted by kara3434

Yeah, I'm willing to sacrifice precision, and in terms of range, the values I want to store only vary from -100 to +100, so range is not a big issue

Then use a 8.8 fixed point representation. That gives a range of the whole part from -128 to 127, and divisions of 1/256 in the fractional part.

**iMalc** · 07-31-2007

If you would rather have a 16-bit float than a 16-bit fixed, then check out my website (link in sig). Go to the Useful classes page. You should find Shortfloat there.
It's C++ though, but you can break it up into lots of little C functions instead.

Thread: Storing a float in 16 bits

Thread Tools

Search Thread

Display

Storing a float in 16 bits

Similar Threads

Moving Average Question

Debug Error Really Quick Question

help me

Backdooring Instantaneous Radius of Curvature & Functions

Possible Loss of data