1. ## Reading 16 bits numbers from a RAW file

Hi.

First of all, just to let you know, I'm a beginner and I can barely use structures and (maybe) lists. ^^'

What I'm trying to do is make a program to read 16 bits values from a header-less RAW file, comparing values and write new values into another RAW file.

Nothing too difficult, except I'm having problems reading 16 bits numbers from the input RAW file.

Plus, I have to deal with little/big endian byte reading methods, which, however, I've already sorted out by using char arrays.

But now I need to know how to reassemble the two bytes in a single 16 bits number that I can use for calcs. =p

Any idea or tutorial I could use? =|

Thank you.

2. Look into bitwise operators, specifically shift (<< and >>), OR and AND (| and &). We have a tutorial on this site (Tutorials - Bitwise Operators and Bit Manipulations in C and C++ - Cprogramming.com) and Google will turn up tons more.

Make sure you always use unsigned integers when using the shift operators*. I recommend using uint16_t from the stdint.h header, to ensure you have a 16 bit number**.

* There are exceptions to this, but they are few and you need not worry about them now.
** You could use short, but technically the standard only guarantees that it is at least 16 bits wide, it could be more (though I know of no system that uses shorts larger than 16 bits).

3. Actually, in the meanwhile I've found my own solution! ^^

I've learnt about unions and used this method to solve my problem:

Code:
```FILE *ptrRAWread;

union twobytes
{int intvalue;
char byte [sizeof (int)]; } bytes;

char *buffer;

buffer = (char*) malloc (sizeof (char) * (lSize))
if (buffer == NULL) {fputs ("\nMemoryerror\n", stderr); exit(2);}

if (result != lSize) {fputs ("\nReadingerror\n", stderr); exit(3);}

bytes.byte[0] = buffer[0];
bytes.byte[1] = buffer[1];
bytes.byte[2] = 0;
bytes.byte[3] = 0;

printf ("\nbytes.intvalue: %i\n", bytes.intvalue);```

Obviously I'll use a "for" scope to run through twobytes.bytes char array for both reading and writing values in files. Is it good? =)

The good thing about this method, if I didnt misunderstand little/big endian, is that I dont need to invert bytes' order because the system will read the integer variable the same direction as the integer values in my RAW file.

So the RAW file says "FF 7F", I read char[0] = "FF" and char[1] = "7F", and the integer value will be "32767". It's perfect!!! ^0^

Just one doubt, now:
I've used the "char *buffer" method from an example for reading files, but my RAW files can easily reach sizes as big as 512MB.....I could, for the little I know, never have such a large amount of contiguous free memory (despite my 12GB of RAM).
Any alternative, something halfway between continuously reading from the file and storing the whole thing in memory? Or maybe reading files byte-by-byte is not as bad as it looks? =o

PS: I use QT Creator, to compile....I hope this doesnt offend your tastes, a friend suggested me to use it. =|

4. Yes you can allocate 512MB at once... but that also depends on the compiler and environment (operating system). So when you malloc, you should check that it succeeded. Always.
Yes it's certainly possible to allocate some amount and read that amount into it. Write out the changed data to the new file and repeat as necessary.
You may not have to worry about endian at all if you're comparing data byte-by-byte.

5. Originally Posted by Spiegel
The good thing about this method, if I didnt misunderstand little/big endian, is that I dont need to invert bytes' order because the system will read the integer variable the same direction as the integer values in my RAW file.

So the RAW file says "FF 7F", I read char[0] = "FF" and char[1] = "7F", and the integer value will be "32767". It's perfect!!! ^0^
This will work only on systems where int is little-endian and sizeof(int) == 4. If sizeof(int) is smaller than 4, you are accessing elements in the byte array beyond its size, and if it's larger than 4, you are leaving elements uninitialized. On big-endian or mixed-endian systems, it'll give you a completely wrong value (the bytes will be reversed on big-endian). If you're worried about your code running only on 32-bit x86 systems or other similar systems, your method should work fine. Otherwise I would do something like this for increased portability:
Code:
`uint16_t value = ((unsigned char)buffer[1] << 8) | (unsigned char)buffer[0];`

6. This code to construct the integer isn't endian-safe:

Code:
```bytes.byte[0] = buffer[0];
bytes.byte[1] = buffer[1];
bytes.byte[2] = 0;
bytes.byte[3] = 0;```
Fine for a little endian system but not a big endian one. I think you'll have to have different little and big endian code there. At least your union method makes it easy to work out at runtime if you're on a big or little endian system.

Other than that though you're okay with endianness.

Originally Posted by Spiegel
I've used the "char *buffer" method from an example for reading files, but my RAW files can easily reach sizes as big as 512MB.....I could, for the little I know, never have such a large amount of contiguous free memory (despite my 12GB of RAM).
I wouldn't try to malloc 512MB. There's no rule that says you can't, and it may well succeed, but you shouldn't hold loads and loads of data in memory unless you really need to. If you're worried about endianness then you're obviously concerned with portability, and mallocing huge stuff may be less likely to succeed on other systems.Reading byte by byte wouldn't be very efficient as you say, so yeah, just read a chunk at a time.

Code:
```char buffer[1024]; // or malloc if you prefer, even though it's small you should still check the return value as nonoob said!

// the "=" isn't a typo, fread returns the number of bytes read. Assignment also returns the value assigned, so the loop will stop when fread returns 0.
{
{
// do processing
}
}```

7. Originally Posted by christop
This will work only on systems where int is little-endian and sizeof(int) == 4. If sizeof(int) is smaller than 4, you are accessing elements in the byte array beyond its size, and if it's larger than 4, you are leaving elements uninitialized. On big-endian or mixed-endian systems, it'll give you a completely wrong value (the bytes will be reversed on big-endian). If you're worried about your code running only on 32-bit x86 systems or other similar systems, your method should work fine. Otherwise I would do something like this for increased portability:
Code:
`uint16_t value = ((unsigned char)buffer[1] << 8) | (unsigned char)buffer[0];`
That was a test code, made to fit a specific case to avoid time-wasting code writings.

In the final code I've made a "for" cicle to initialize all the array elements limited to "sizeof(unsigned int)-1".

And since, by using values between 0 and 65535, I only need the first two bytes, there will never be problems in assigning values byte by byte. =)

Should I post the code as it is now? =o

Anyway, yep, it's meant to work only on Windows systems and probably only on my system... ^^

PS Edit: my knowledge doesnt let me understand what you wrote as a template, it's like trying to read russian, for me. =p
Is that some kind of data type definition? Or what? =\

8. I came up with the union idea just because of the little/big endian problem, but I wasnt planning on portability. I was concerned just because I noticed that the RAW file is written with bytes couples inverted, so I'd end up with messed up values, if reading them as they are.

About the large amount of data, since I'm working with RAW files, what I really need is loading like 3 "rows" of pixels at a time for comparing values, so it shouldnt be a problem even with files as large as 16384x16384 pixels.
The down side is that I'm still going to need to move by one row at a time, but shouldnt be too time consuming. It would be something like "load abc and compare" then "load bcd and compare" and again "load cde and compare". Should work fine and better than loading the whole thing in the memory.

Just one noob question, if you let me: which systems use little endian and which ones big endian? =o

9. The list of each type is at the bottom of this page:
Endianness - Wikipedia, the free encyclopedia

10. x86 (PC and new Macs) are little endian. PowerPC (old Macs and PlayStations) is big endian. Sun SPARC (server CPU architecture) is big endian. Some architectures, like ARM, have configurable endian-ness (I believe there's a register you can write to that makes the system small/big endian, then of course, all the multi-byte data you had in memory prior to that point would be invalid).

Just a few examples.

11. Originally Posted by cyberfish
x86 (PC and new Macs) are little endian. PowerPC (old Macs and PlayStations) is big endian. Sun SPARC (server CPU architecture) is big endian. Some architectures, like ARM, have configurable endian-ness (I believe there's a register you can write to that makes the system small/big endian, then of course, all the multi-byte data you had in memory prior to that point would be invalid).

Just a few examples.

Good, thus using this program on a PC will never have a problem with endianness. =)