Thread: binary file reading.....

  1. #1
    Registered User
    Join Date
    Jun 2009
    Location
    US of A
    Posts
    305

    binary file reading.....

    I have to read a binary file and i just created one binarytext.txt and inserted into it the following sequence of characters.

    00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 0000000000000

    Now i have to read this file.So what i do is

    [insert]
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int main(int argc , char *argv[])
    {	
    	FILE *fp;
    	int i;
    
    	fp = fopen(argv[1], "rb");
    	if(fp == NULL)
    	{
    		printf("\n Unable to open the file. Aborting");
    		exit(EXIT_FAILURE);
    	}
    
    	if(argc < 1)
    	{
    		printf("\n You must enter the name of the binary file from which to read the data");
    		exit(EXIT_FAILURE);
    	}
    
    	fread(&i, sizeof(i), 1, fp);
    	printf("\n Data that was read from the binary file is %d", i);
    
    	fclose(fp);
    	return 0;
    }
    Now i was expecting the input to be 0 (all the bits are 0 in the file and fread would read 4 bytes). But what i am getting output is some very huge number


    Data that was read from the binary file is 808464432Press any key to continue .
    . .
    Why is it so? Am i doing something wrong ? Because i am just opening the filein rb mode and reading 32 bits using fread.

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Perhaps you have the character (number) '0', which is 0x30 ascii.
    Mainframe assembler programmer by trade. C coder when I can.

  3. #3
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    00000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000 0000000000000
    Did you enter that into a file with a text editor? Because if you did, and your system uses ascii, then you just entered the number '48' a bunch of times.

  4. #4
    Registered User
    Join Date
    Jun 2009
    Location
    US of A
    Posts
    305
    Quote Originally Posted by sean View Post
    Did you enter that into a file with a text editor? Because if you did, and your system uses ascii, then you just entered the number '48' a bunch of times.
    Yes i did use a text editor to create that file. So what should i do to get a binary file. I mean should i get some .bmp file and try to read that. That should act as a good substitute for my foolishness of creating a binary file by myself. :-)

  5. #5
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    I mean should i get some .bmp file and try to read that.
    If you want - you could also read the current file, but understanding that each byte is going to be a ascii representation. Another option is to create a binary file in C. Maybe output a bunch of integers, and try read them back in.

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by roaan View Post
    Yes i did use a text editor to create that file. So what should i do to get a binary file. I mean should i get some .bmp file and try to read that. That should act as a good substitute for my foolishness of creating a binary file by myself. :-)
    A binary file is exactly the same as a text file, more or less. The distinction is made because certain BYTE values represent special things in text, such as the null terminator (0) and newline (10) and the microsoft newline, which is two bytes.

    But a BYTE has eight BITS in it, so you cannot really be setting BIT values with BYTE values, which a single character is a BYTE not a BIT.

    However (I could be wrong) I do not think there are "binary file formats" that use bit values beyond their place in the byte. So perhaps you have been confused by the term "binary file format"? Usually, the term "binary file" just refers to a file that has 0's in it, which are not good in a text file, since a C string* would be terminated by a 0 value. (NB. a 0 value is not the same as the character '0', which as sean implied earlier has a byte value of 48, that is: 00110000, the bits set being 32 and 16). So a text string "0011" has a byte value of 48-48-49-49, qv the ascii table.

    * and hence text strings in pretty much every other language as well.
    Last edited by MK27; 08-27-2009 at 05:50 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #7
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Okay, here is explanation #2 which I will include in the book*. With a text file, the byte values are usually presumed to be characters, from the ascii table. So this sequence of bytes:

    40 145 154 154 157 32 127 157 162 154 144

    is English for "ciao mondo", which if you don't understand Italian, check the ascii table DECIMAL values (since byte values are often in hex, 0x30 -- Dino -- being 48 -- sean) to solve the mystery.

    Anyway, with a "binary file" such as an image like a .bmp, the byte values have a totally different meaning because the file is not intended to be rendered as text. These are also considered UNSIGNED values, which means they range 0-255 (the ascii table values are SIGNED, so -128 to 127). Remember, 1000000 signed is -1. Unsigned, it is 128.

    Is it starting to get clearer? Really, the term "binary file" is not so good, probably a better distinction would be between "text" and "data" files.
    You could view a text file as "binary" in some context (eg, given a proper header, which is the first few dozen bytes, and renamed .bmp, you could look at noise patterns in a viewer) and binary files as text (but usually the text ends prematurely because of a zero byte, and you get lots of � stuff).

    * you did not scratch a banana here, roaan, but I can remember being confused by this also.
    Last edited by MK27; 08-27-2009 at 06:07 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  8. #8
    Registered User
    Join Date
    Jun 2009
    Location
    US of A
    Posts
    305
    I seem to understand a little that the file i created is sort of an array of characters. Now here the file that i created is a sequence of 0's which i was interpreting as bit 0 that would be stored in the computer and i was wrong in that understanding. Now when you say that i can read the file that i created in binary format as well but i have to make sure that i keep in mind the fact that the '0' that i would be reading would be sort of set of 8 bits with 16 and 32 bit values set. But how do i make my program understand that when it comes across the char '0' it implies that is equivalent to 00110000.

  9. #9
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by roaan View Post
    But how do i make my program understand that when it comes across the char '0' it implies that is equivalent to 00110000.
    It does understand that, except backwards. What literally happens is the byte, 00110000, is understood as a signed char with a value of 32+16=48.
    Code:
    char X=48;
    printf("%c",X);
    The output is 0.

    Read my last post again, I added some stuff after you posted.

    Anyway, it depends what you want to do with "the data". This is why I said AFAIK there is no such thing as a "binary file" where the significant unit is actually the bit. It is always the byte, which is 8 bits. Certainly, almost all image data (like "bit"maps) are like that. Probably/possibly they could compress down to a bit level, this is really dependant on the protocol. To do that in C, I suppose you would just translate the byte value into an array of 8 booleans, or use some kind of struct with members like:
    Code:
    unsigned char bit1:1;
    unsigned char bit2:1;
    The :1 indicates one bit.
    Last edited by MK27; 08-27-2009 at 06:18 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  10. #10
    Registered User
    Join Date
    Jun 2009
    Location
    US of A
    Posts
    305
    Quote Originally Posted by MK27 View Post
    Okay, here is explanation #2 which I will include in the book*. With a text file, the byte values are usually presumed to be characters, from the ascii table. So this sequence of bytes:

    40 145 154 154 157 32 127 157 162 154 144

    is English for "ciao mondo", which if you don't understand Italian, check the ascii table DECIMAL values (since byte values are often in hex, 0x30 -- Dino -- being 48 -- sean) to solve the mystery.
    I seem to be getting overwhelmed by what i am trying to understand. Now when you say that the sequency of bytes you represent is equivalent to ciao mondo. I have the interpretation that when i convert 'c' to its ascii equivalent its supposed to be 99 in ascii. But what actually gets stored in teh computer is teh hex equivalent of it which should be
    0x63 which in binary would be 01100011. So what is 40 implying here?

  11. #11
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by roaan View Post
    I seem to be getting overwhelmed by what i am trying to understand.
    Hey sorry, I'm getting all fancy now I've sold more than 39,000 copies The English translation of ciao mondo is:

    Hello World

    A bitmap header is 54 bytes. That plus the "Hello World" byte sequence and you would have a small squiggly picture.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  12. #12
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by MK27 View Post
    Okay, here is explanation #2 which I will include in the book*. With a text file, the byte values are usually presumed to be characters, from the ascii table. So this sequence of bytes:

    40 145 154 154 157 32 127 157 162 154 144

    is English for "ciao mondo", which if you don't understand Italian, check the ascii table DECIMAL values (since byte values are often in hex, 0x30 -- Dino -- being 48 -- sean) to solve the mystery.

    Anyway, with a "binary file" such as an image like a .bmp, the byte values have a totally different meaning because the file is not intended to be rendered as text. These are also considered UNSIGNED values, which means they range 0-255 (the ascii table values are SIGNED, so -128 to 127). Remember, 1000000 signed is -1. Unsigned, it is 128.

    Is it starting to get clearer? Really, the term "binary file" is not so good, probably a better distinction would be between "text" and "data" files.
    You could view a text file as "binary" in some context (eg, given a proper header, which is the first few dozen bytes, and renamed .bmp, you could look at noise patterns in a viewer) and binary files as text (but usually the text ends prematurely because of a zero byte, and you get lots of � stuff).

    * you did not scratch a banana here, roaan, but I can remember being confused by this also.
    I think your conversions are off, MK -- you went to a great deal of trouble to mention that characters stop at 127, then list a bunch of characters in the 140s and 150s?

  13. #13
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by tabstop View Post
    I think your conversions are off, MK -- you went to a great deal of trouble to mention that characters stop at 127, then list a bunch of characters in the 140s and 150s?
    Thanks tabstop, I will fire my current editor -- are you available??

    whoops

    I obviously mixed and matched OCTAL* and DECIMAL values, and in addition, misread the columns in the table. Wow (looking at that, I actually did it consistently!) So:

    72 101 108 108 111 32 87 111 114 108 100

    Many apologies to roaan, I hate to think I am reducing the possibility for "trust" in the world thru my silliness and screw-ups...so next time I'll stick with hex

    * why there are octal values in this table I dunno
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  14. #14
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by roaan View Post
    Yes i did use a text editor to create that file. So what should i do to get a binary file. I mean should i get some .bmp file and try to read that. That should act as a good substitute for my foolishness of creating a binary file by myself. :-)
    A computer encodes all of the information within it in binary. It doesn't have to be that way - you could design a decimal computer, but it just turns out that it's easier/cheaper/reliable/etc to manufacture them that way, plus binary encoding has some useful mathematical properties that make it superior to perhaps any other encoding scheme. Now, a number is a number, and you can interpret it in any conceivable number base. Let's say you decide to use only three digits to represent a number. If you wrote out a column of numbers, ascending from zero, and then next to it the decimal equivalent, you'd get:

    Code:
    BASE-3	BASE-10
    --------	--------
    0		0
    1		1
    2		2
    10		3
    11		4
    12		5
    20		6
    21		7
    22		8
    100		9
    101		10
    102		11
    110		12
    111		13
    112		14
    120		15
    121		16
    122		17
    200		18
    201		19
    202		20
    210		21
    211		22
    212		23
    220		24
    221		25
    222		26
    1000	27

    The numbers are the same, the only difference is the representation (eg: number base) used.

    Now, you don't have to use digits to encode a number, either. ASCII is one such encoding (then again, so is HEX!). Instead of just numbers it employs "glyphs" (such as the ones you are reading at this moment). Thare are many other encoding schemes, as well. The point is, the underlying numbers are the same - only the interpretation has changed. When I say '0' you know that this is the number 48 in decimal, 30 in hex, etc. As long as we know which convention is being used there is no ambiguity (as implied by the joke: "There are 10 kinds of people in this world - those whose understand binary and those who do not.").

    So at some point hardware manufactures decided that instead of making each individual bit addressable, they'd make a group of 8-bits the smallest addressable unit, eg the byte (or octet). Again, it doesn't have to be that way - they could have went with a 13-bit byte, say (and some machines in fact do use a different convention). So naturally, when you access a chunk of memory, or read from a file, it's done in blocks of 8-bits (as the smallest unit).

    Ok, so now let's say you open up a text editor enter 10101010. It maps each glyph to appropriate numerical values (in this case: 48, 49, 48, 49, 48, 49, 48, 49) which are then stored as binary data on the disk and in memory. Obviously then, a text-editor is just a specialized ASCII-to-circuitry-encoded-binary conversion program of sorts (whereas a hex-editor converts a subset of ASCII ('0'-'9', 'A'-'F') to binary (again, in circuitry)). Naturally, then, if you want to convert the subset of ASCII ('0'-'1'), a text-editor is the wrong tool for the job - you need a specialized program.

    Also, you should always remember that the meaning of the binary data in a computer is strictly dependant on the program processing it, eg: file-type A could confuse a program that processes file-type B into thinking it's working with legitimate data (the more complex the format the more unlikely this is, of course), and while some programs only view the data as bytes, others may be interested in certain fields of each byte (eg: 3-bit chunks) or perhaps some data type larger than a byte (contiguous chunks of 64-bit numbers, say). Huffman encoding is a good example: the size of each field of the compressed data is variable (from 1 to 256 bits) from one compressed file to another!

    Well, I hope that helps clear things up for you, anyway.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Binary Tree - Reading From and Writing to a File
    By Ctank02 in forum C++ Programming
    Replies: 2
    Last Post: 03-15-2008, 09:22 PM
  2. Possible circular definition with singleton objects
    By techrolla in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2004, 10:46 AM
  3. Problem reading a delimited file into a binary tree
    By neolyn in forum C++ Programming
    Replies: 10
    Last Post: 12-09-2004, 07:51 PM
  4. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  5. Reading data from a binary file
    By John22 in forum C Programming
    Replies: 7
    Last Post: 12-06-2002, 02:00 PM