Thread: A doubt about how data is stored in a file

  1. #1
    Registered User
    Join Date
    Feb 2009
    Posts
    26

    A doubt about how data is stored in a file

    Hi all,

    I have a file in which each line is a tab seperated array of floats. And i have close 200,000 lines in that file. Now, I wanted a fast way to read this file.. Reading it line by line was too slow, so I wanted to try and read in chunks of the file (Say 100 lines at a time) and extract out the values from that in memory.

    To do this, I first wanted to find the memory size in the file taken up by 1 line. Then I could fread(100xsizeof 1 line).. But when I tried to do this, i found that each line had a different size on the file. I can understand that if I read one line of the file into a string in my program and try to find the string length, it would be different for each line because a float no. like 234.56 would take up 6 chars while 23.56 would only take up 5 chars.

    But even when I tried to check how much distance the file pointer had moved after 1 fgets().. [ftell(fp)].. I still get different sizes for different lines!. So my question is, how exactly is data stored into a file? If i try to store a float, is it converted into a string when it is being stored in a file? Should 200 floats always occupy the same memory space irrespective of what the value of the float is?

    And lastly, to go a little bit off topic, if anyone has any pointers on whether I am going the right way in trying to read my file in the quickest possible way, I would really appreciate your advice..

    Thanks in advance,

    Avinash

  2. #2
    Registered User Maz's Avatar
    Join Date
    Nov 2005
    Location
    Finland
    Posts
    194
    In binary file a float is sizeof(float). In text file, it is written as text. So if your text editor can open and show the file contents correctly, then floats are stored as strings.

  3. #3
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    If you view the file in notepad, or using type or less or some such, can you see all the nice little numbers? Sounds to me like it's just a text only file. Funny thing about text, lines are not the same size. You can't accurately fseek on text files.

    Read one line at a time, the way it was intended.


    Quzah.
    Hope is the first step on the road to disappointment.

  4. #4
    Registered User
    Join Date
    Feb 2009
    Posts
    26
    Thanks for the replies guys.. One point I forgot to mention was that the file is created using another program which I wrote. After seeing your replies I went back to that program and changed it to open the outfile in binary mode.. fopen(filename, "wb"); But still, if I open the outfile in notepad, I can see the "nice little numbers".. So even if I open the file in 'wb' mode, I am only creating a text file ? How could that be? I am using fprintf to write into the file. Is that what is creating a problem?

  5. #5
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Two reasons:

    1 - There's no difference between "binary mode" and "text mode" on most environments.
    2 - You are writing them as text, not as the bytes which make up the number.

    Consider:
    Code:
    #include<stdio.h>
    int main( void )
    {
        float f = 1.23456789;
        FILE *fp1, *fp2;
        fp1 = fopen( "fp1.dat", "w" );
        fp2 = fopen( "fp2.dat", "w" );
        if( fp1 )
        {
            fwrite( &f, sizeof( f ), 1, fp1 );
            fclose( fp1 );
        }
        if( fp2 )
        {
            fprintf( fp2, "%f", f );
            fclose( fp2 );
        }
        return 0; 
    }
    They do not produce the same results.

    Quzah.
    Hope is the first step on the road to disappointment.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by quzah View Post
    Two reasons:

    1 - There's no difference between "binary mode" and "text mode" on most environments.
    Indeed - and even when there IS a difference, there is nothing preventing you from opening a binary file and storing text in the file, e.g. using fprintf() and reading with with fscanf() - and quite often it works just the same, even on machines that DO make a difference between binary and not-binary files (aka text files).

    The most common reason for a difference between binary and non-binary files is the treatment of line-endings and special "end of file" characters. In binary mode, whatever you give to the FILE will be stored "as it is". In text mode, '\n' will be translated to line-ending for the machne, which may be '\n' (Linux & Unix for example), '\r' (MacOS before MacOS X), or "\r\n" (Windows, DOS, OS/2, VAX/VMS and several other OS's use).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Registered User
    Join Date
    Feb 2009
    Posts
    26
    Ok.. So basically what you are saying is that the open mode doesn't really matter and what does is actually whether I use fprintf() or fwrite() to write it into the file.. That makes sense.. cos we use all the format specifiers in fprintf()..

    Thanks a lot for the help guys..

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by avi2886 View Post
    Ok.. So basically what you are saying is that the open mode doesn't really matter and what does is actually whether I use fprintf() or fwrite() to write it into the file.. That makes sense.. cos we use all the format specifiers in fprintf()..

    Thanks a lot for the help guys..
    Yes, essentially so.

    Detail: Well, it DOES matter, but only in the sense that line-endings [and such] written by fprintf() or similar in a binary file may not appear like correct line-endings in the actual file. The other way around is actually worse in those cases where the line-endings in C [it's always supposed to end in newline, '\n' according to the standard library definitions - note that this DOESN'T apply if you are using low-level native API's, only when using "stdio.h" functions) are different from line-endings on the file - say you are writing the value 10 to a file in binary form - this happens to be newline. So the C library translates it to '\r', '\n', because that's what's supposed to be stored in the file. If the other application reading the file doesn't make the same translation in reverse, you will have a problem, because you've just added some data to your file! So the "b" in file-mode is not useless - it just doesn't affect how the actual data is stored in the file. You choose the right function to determine that.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 48
    Last Post: 09-26-2008, 03:45 AM
  2. Editing a data file
    By Strait in forum C++ Programming
    Replies: 7
    Last Post: 02-05-2005, 04:21 PM
  3. Possible circular definition with singleton objects
    By techrolla in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2004, 10:46 AM
  4. spell check in C using a dictionary file
    By goron350 in forum C Programming
    Replies: 10
    Last Post: 11-25-2004, 06:44 PM
  5. File Database & Data Structure :: C++
    By kuphryn in forum C++ Programming
    Replies: 0
    Last Post: 02-24-2002, 11:47 AM