binary file reading.....

**roaan** · 08-27-2009

I have to read a binary file and i just created one binarytext.txt and inserted into it the following sequence of characters.

00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 0000000000000

Now i have to read this file.So what i do is

[insert]

Code:

#include <stdio.h>
#include <stdlib.h>

int main(int argc , char *argv[])
{	
	FILE *fp;
	int i;

	fp = fopen(argv[1], "rb");
	if(fp == NULL)
	{
		printf("\n Unable to open the file. Aborting");
		exit(EXIT_FAILURE);
	}

	if(argc < 1)
	{
		printf("\n You must enter the name of the binary file from which to read the data");
		exit(EXIT_FAILURE);
	}

	fread(&i, sizeof(i), 1, fp);
	printf("\n Data that was read from the binary file is %d", i);

	fclose(fp);
	return 0;
}

Now i was expecting the input to be 0 (all the bits are 0 in the file and fread would read 4 bytes). But what i am getting output is some very huge number

Data that was read from the binary file is 808464432Press any key to continue .
. .
Why is it so? Am i doing something wrong ? Because i am just opening the filein rb mode and reading 32 bits using fread.

**Dino** · 08-27-2009

Perhaps you have the character (number) '0', which is 0x30 ascii.

**sean** · 08-27-2009

00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 0000000000000

Did you enter that into a file with a text editor? Because if you did, and your system uses ascii, then you just entered the number '48' a bunch of times.

**roaan** · 08-27-2009

Originally Posted by sean

Did you enter that into a file with a text editor? Because if you did, and your system uses ascii, then you just entered the number '48' a bunch of times.

Yes i did use a text editor to create that file. So what should i do to get a binary file. I mean should i get some .bmp file and try to read that. That should act as a good substitute for my foolishness of creating a binary file by myself. :-)

**sean** · 08-27-2009

I mean should i get some .bmp file and try to read that.

If you want - you could also read the current file, but understanding that each byte is going to be a ascii representation. Another option is to create a binary file in C. Maybe output a bunch of integers, and try read them back in.

**MK27** · 08-27-2009

Originally Posted by roaan

Yes i did use a text editor to create that file. So what should i do to get a binary file. I mean should i get some .bmp file and try to read that. That should act as a good substitute for my foolishness of creating a binary file by myself. :-)

A binary file is exactly the same as a text file, more or less. The distinction is made because certain BYTE values represent special things in text, such as the null terminator (0) and newline (10) and the microsoft newline, which is two bytes.

But a BYTE has eight BITS in it, so you cannot really be setting BIT values with BYTE values, which a single character is a BYTE not a BIT.

However (I could be wrong) I do not think there are "binary file formats" that use bit values beyond their place in the byte. So perhaps you have been confused by the term "binary file format"? Usually, the term "binary file" just refers to a file that has 0's in it, which are not good in a text file, since a C string* would be terminated by a 0 value. (NB. a 0 value is not the same as the character '0', which as sean implied earlier has a byte value of 48, that is: 00110000, the bits set being 32 and 16). So a text string "0011" has a byte value of 48-48-49-49, qv the ascii table.

* and hence text strings in pretty much every other language as well.

**MK27** · 08-27-2009

Okay, here is explanation #2 which I will include in the book*. With a text file, the byte values are usually presumed to be characters, from the ascii table. So this sequence of bytes:

40 145 154 154 157 32 127 157 162 154 144

is English for "ciao mondo", which if you don't understand Italian, check the ascii table DECIMAL values (since byte values are often in hex, 0x30 -- Dino -- being 48 -- sean) to solve the mystery.

Anyway, with a "binary file" such as an image like a .bmp, the byte values have a totally different meaning because the file is not intended to be rendered as text. These are also considered UNSIGNED values, which means they range 0-255 (the ascii table values are SIGNED, so -128 to 127). Remember, 1000000 signed is -1. Unsigned, it is 128.

Is it starting to get clearer? Really, the term "binary file" is not so good, probably a better distinction would be between "text" and "data" files.
You could view a text file as "binary" in some context (eg, given a proper header, which is the first few dozen bytes, and renamed .bmp, you could look at noise patterns in a viewer) and binary files as text (but usually the text ends prematurely because of a zero byte, and you get lots of � stuff).

* you did not scratch a banana here, roaan, but I can remember being confused by this also.

**roaan** · 08-27-2009

I seem to understand a little that the file i created is sort of an array of characters. Now here the file that i created is a sequence of 0's which i was interpreting as bit 0 that would be stored in the computer and i was wrong in that understanding. Now when you say that i can read the file that i created in binary format as well but i have to make sure that i keep in mind the fact that the '0' that i would be reading would be sort of set of 8 bits with 16 and 32 bit values set. But how do i make my program understand that when it comes across the char '0' it implies that is equivalent to 00110000.

**MK27** · 08-27-2009

Originally Posted by roaan

But how do i make my program understand that when it comes across the char '0' it implies that is equivalent to 00110000.

It does understand that, except backwards. What literally happens is the byte, 00110000, is understood as a signed char with a value of 32+16=48.

Code:

char X=48;
printf("%c",X);

The output is 0.

Read my last post again, I added some stuff after you posted.

Anyway, it depends what you want to do with "the data". This is why I said AFAIK there is no such thing as a "binary file" where the significant unit is actually the bit. It is always the byte, which is 8 bits. Certainly, almost all image data (like "bit"maps) are like that. Probably/possibly they could compress down to a bit level, this is really dependant on the protocol. To do that in C, I suppose you would just translate the byte value into an array of 8 booleans, or use some kind of struct with members like:

Code:

unsigned char bit1:1;
unsigned char bit2:1;

The :1 indicates one bit.

**roaan** · 08-27-2009

Originally Posted by MK27

Okay, here is explanation #2 which I will include in the book*. With a text file, the byte values are usually presumed to be characters, from the ascii table. So this sequence of bytes:

40 145 154 154 157 32 127 157 162 154 144

is English for "ciao mondo", which if you don't understand Italian, check the ascii table DECIMAL values (since byte values are often in hex, 0x30 -- Dino -- being 48 -- sean) to solve the mystery.

I seem to be getting overwhelmed by what i am trying to understand. Now when you say that the sequency of bytes you represent is equivalent to ciao mondo. I have the interpretation that when i convert 'c' to its ascii equivalent its supposed to be 99 in ascii. But what actually gets stored in teh computer is teh hex equivalent of it which should be
0x63 which in binary would be 01100011. So what is 40 implying here?

**MK27** · 08-27-2009

Originally Posted by roaan

I seem to be getting overwhelmed by what i am trying to understand.

Hey sorry, I'm getting all fancy now I've sold more than 39,000 copies

The English translation of ciao mondo is:

Hello World

A bitmap header is 54 bytes. That plus the "Hello World" byte sequence and you would have a small squiggly picture.

**tabstop** · 08-27-2009

Originally Posted by MK27

Okay, here is explanation #2 which I will include in the book*. With a text file, the byte values are usually presumed to be characters, from the ascii table. So this sequence of bytes:

40 145 154 154 157 32 127 157 162 154 144

is English for "ciao mondo", which if you don't understand Italian, check the ascii table DECIMAL values (since byte values are often in hex, 0x30 -- Dino -- being 48 -- sean) to solve the mystery.

Anyway, with a "binary file" such as an image like a .bmp, the byte values have a totally different meaning because the file is not intended to be rendered as text. These are also considered UNSIGNED values, which means they range 0-255 (the ascii table values are SIGNED, so -128 to 127). Remember, 1000000 signed is -1. Unsigned, it is 128.

Is it starting to get clearer? Really, the term "binary file" is not so good, probably a better distinction would be between "text" and "data" files.
You could view a text file as "binary" in some context (eg, given a proper header, which is the first few dozen bytes, and renamed .bmp, you could look at noise patterns in a viewer) and binary files as text (but usually the text ends prematurely because of a zero byte, and you get lots of � stuff).

* you did not scratch a banana here, roaan, but I can remember being confused by this also.

I think your conversions are off, MK -- you went to a great deal of trouble to mention that characters stop at 127, then list a bunch of characters in the 140s and 150s?

**MK27** · 08-27-2009

Originally Posted by tabstop

I think your conversions are off, MK -- you went to a great deal of trouble to mention that characters stop at 127, then list a bunch of characters in the 140s and 150s?

Thanks tabstop, I will fire my current editor -- are you available??

whoops

I obviously mixed and matched OCTAL* and DECIMAL values, and in addition, misread the columns in the table. Wow (looking at that, I actually did it consistently!) So:

72 101 108 108 111 32 87 111 114 108 100

Many apologies to roaan, I hate to think I am reducing the possibility for "trust" in the world thru my silliness and screw-ups...so next time I'll stick with hex

* why there are octal values in this table I dunno

**Sebastiani** · 08-27-2009

Originally Posted by roaan

Yes i did use a text editor to create that file. So what should i do to get a binary file. I mean should i get some .bmp file and try to read that. That should act as a good substitute for my foolishness of creating a binary file by myself. :-)

A computer encodes all of the information within it in binary. It doesn't have to be that way - you could design a decimal computer, but it just turns out that it's easier/cheaper/reliable/etc to manufacture them that way, plus binary encoding has some useful mathematical properties that make it superior to perhaps any other encoding scheme. Now, a number is a number, and you can interpret it in any conceivable number base. Let's say you decide to use only three digits to represent a number. If you wrote out a column of numbers, ascending from zero, and then next to it the decimal equivalent, you'd get:

Code:

BASE-3	BASE-10
--------	--------
0		0
1		1
2		2
10		3
11		4
12		5
20		6
21		7
22		8
100		9
101		10
102		11
110		12
111		13
112		14
120		15
121		16
122		17
200		18
201		19
202		20
210		21
211		22
212		23
220		24
221		25
222		26
1000	27

The numbers are the same, the only difference is the representation (eg: number base) used.

Now, you don't have to use digits to encode a number, either. ASCII is one such encoding (then again, so is HEX!). Instead of just numbers it employs "glyphs" (such as the ones you are reading at this moment). Thare are many other encoding schemes, as well. The point is, the underlying numbers are the same - only the interpretation has changed. When I say '0' you know that this is the number 48 in decimal, 30 in hex, etc. As long as we know which convention is being used there is no ambiguity (as implied by the joke: "There are 10 kinds of people in this world - those whose understand binary and those who do not.").

So at some point hardware manufactures decided that instead of making each individual bit addressable, they'd make a group of 8-bits the smallest addressable unit, eg the byte (or octet). Again, it doesn't have to be that way - they could have went with a 13-bit byte, say (and some machines in fact do use a different convention). So naturally, when you access a chunk of memory, or read from a file, it's done in blocks of 8-bits (as the smallest unit).

Ok, so now let's say you open up a text editor enter 10101010. It maps each glyph to appropriate numerical values (in this case: 48, 49, 48, 49, 48, 49, 48, 49) which are then stored as binary data on the disk and in memory. Obviously then, a text-editor is just a specialized ASCII-to-circuitry-encoded-binary conversion program of sorts (whereas a hex-editor converts a subset of ASCII ('0'-'9', 'A'-'F') to binary (again, in circuitry)). Naturally, then, if you want to convert the subset of ASCII ('0'-'1'), a text-editor is the wrong tool for the job - you need a specialized program.

Also, you should always remember that the meaning of the binary data in a computer is strictly dependant on the program processing it, eg: file-type A could confuse a program that processes file-type B into thinking it's working with legitimate data (the more complex the format the more unlikely this is, of course), and while some programs only view the data as bytes, others may be interested in certain fields of each byte (eg: 3-bit chunks) or perhaps some data type larger than a byte (contiguous chunks of 64-bit numbers, say). Huffman encoding is a good example: the size of each field of the compressed data is variable (from 1 to 256 bits) from one compressed file to another!

Well, I hope that helps clear things up for you, anyway.

Thread: binary file reading.....

Thread Tools

Search Thread

Display

binary file reading.....

Similar Threads

Binary Tree - Reading From and Writing to a File

Possible circular definition with singleton objects

Problem reading a delimited file into a binary tree

Binary Search Trees Part III

Reading data from a binary file