Thread: size of text file without traversing

  1. #1
    Registered User
    Join Date
    Aug 2005
    Posts
    113

    size of text file without traversing

    How can we determine size of a text file without traversing it.
    I think this question may be related to the fact that binary file determines end of file by number of characters

  2. #2
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    There is no portable way to do this.

    On nix and Mac, the text size will be the same as the binary size.

    On Windows, each '\r\n' in the file is replaced with '\n' when read in as text. Therefore, we can see that we must traverse the file to find out how many instances of '\r\n' are in the file to determine its text size.

  3. #3
    Registered User
    Join Date
    Aug 2005
    Posts
    1,267
    To get the file size, just open the file, seek to the end and get current file pointer position. how to do that depends on C or C++. the presence or absence of "\r\n" is not relevant to determining the size of the file. And it doesn't matter whether you open the file in binary or text mode, the file size will be the same.
    must traverse the file to find out how many instances of '\r\n' are in the file to determine its text size.
    No. That will only determine the number of lines in the file, not the size of the file.
    Last edited by Ancient Dragon; 12-24-2005 at 06:27 AM.

  4. #4
    Registered User
    Join Date
    Aug 2005
    Posts
    113
    thanx,it works.

  5. #5
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Quote Originally Posted by Ancient Dragon
    To get the file size, just open the file, seek to the end and get current file pointer position. how to do that depends on C or C++. the presence or absence of "\r\n" is not relevant to determining the size of the file. And it doesn't matter whether you open the file in binary or text mode, the file size will be the same.
    The OP asked how to get the size of a text file as opposed to the size of a binary file. Assumably, we can take that to mean that he wishes to know how many characters can be read from the file with the text functions such as fgets, fgetc, etc.

    This differs from the binary size of a file on the Windows platform due to line-end translation. It can only be obtained by reading in the entire file.

    This is demonstrated by a simple program:
    Code:
    #include <stdio.h>
    
    int main(void)
    {
    	long  bad_size  = 0;
    	long  good_size = 0;
    	FILE* hFile;
    
    	hFile = fopen("filesize.c", "r");
    
    	/* This method will give incorrect text size. */
    	fseek(hFile, 0, SEEK_END);
    	bad_size = ftell(hFile);
    
    	/* Rewind file. */
    	fseek(hFile, 0, SEEK_SET);
    
    	/* Now traverse the file to get the correct text size. */
    	/* Note, we can read in the entire file with fgets, fscanf, etc but
    	 * fgetc is simplest for this purpose. */
    	while (fgetc(hFile) != EOF)
    	{
    		good_size++;
    	}
    
    	printf("The bad size is %d.\n", bad_size);
    	printf("The good size is %d.\n", good_size);
    
            fclose(hFile);
    
    	getchar();
    	return 0;
    }
    Results:
    Code:
    The bad size is 687.
    The good size is 643.
    Thje good news is that the binary size of a file will typically be as large or larger than the text size*. Therefore, your method may be used to get a size for memory allocation for example before reading in the file.

    * This is not guaranteed by the C standard. Charset conversion may mean that the binary size of a file is smaller than the text size on some platforms. Theoretically, a C library implementation could even compress text files on output and decompress on input.

  6. #6
    Registered User
    Join Date
    Aug 2005
    Posts
    1,267
    depends on what you are calling "good size" -- In my opinion "file size" is found by what you call "bad size". now, if you using MS-windows os, open up a command prompt, cd to the directory that contans the file, and do "dir filesize.c <Enter>". That will give you the same size as your program said was "bad".

    I don't know how compressed file systems will react, haven't used one.
    Last edited by Ancient Dragon; 12-24-2005 at 09:56 AM.

  7. #7
    Registered User
    Join Date
    Aug 2005
    Posts
    113
    True,bad_size reffered by anonytmouse is actual size recognised by any O.S.

    Conversion of \n to \r\n during reading is a feature of standard library function and not that of O.S. Hence O.S treats \r and\n as seperate characters.

    Following code proves this.
    Code:
    #include<iostream.h>
    #include<conio.h>
    #include<fstream.h>
    
    int main()
    {
    clrscr();
    ifstream file("c:\\windows\\desktop\\01548.txt");
    if(!file)
      cout<<"File cannot be opened";
    
    else
    {
      file.seekg(0,ios::end);
      cout<<file.tellg()<<"bytes"<<(float)file.tellg()/1024<<"K.B";
    }
    getch();
    return(0);
    }
    After right clicking and seeing properties I find same size as generated by above program.This proves my point.

  8. #8
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    It's not even as simple as that. Character size depends on more than end-of-line conventions. The encoding is another big part. In UTF-16, every character needs at least two bytes. In UTF-8, a character may be as large as five bytes. In Shift-JIS, some character runs are one byte each, some are two bytes.
    To get the true character size of a text file, you have no choice but to decode it completely.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    A VAX will make life interesting

  10. #10
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    I wouldn't call that interesting, exactly.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  11. #11
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    You could use GetFileSize in Win32: http://www.daniweb.com/techtalkforums/thread37272.html
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Newbie homework help
    By fossage in forum C Programming
    Replies: 3
    Last Post: 04-30-2009, 04:27 PM
  2. Adventures in labyrinth generation.
    By guesst in forum Game Programming
    Replies: 8
    Last Post: 10-12-2008, 01:30 PM
  3. Formatting a text file...
    By dagorsul in forum C Programming
    Replies: 12
    Last Post: 05-02-2008, 03:53 AM
  4. Read word from text file (It is an essay)
    By forfor in forum C Programming
    Replies: 7
    Last Post: 05-08-2003, 11:45 AM
  5. Outputting String arrays in windows
    By Xterria in forum Game Programming
    Replies: 11
    Last Post: 11-13-2001, 07:35 PM