How can we determine size of a text file without traversing it.
I think this question may be related to the fact that binary file determines end of file by number of characters
How can we determine size of a text file without traversing it.
I think this question may be related to the fact that binary file determines end of file by number of characters
There is no portable way to do this.
On nix and Mac, the text size will be the same as the binary size.
On Windows, each '\r\n' in the file is replaced with '\n' when read in as text. Therefore, we can see that we must traverse the file to find out how many instances of '\r\n' are in the file to determine its text size.
To get the file size, just open the file, seek to the end and get current file pointer position. how to do that depends on C or C++. the presence or absence of "\r\n" is not relevant to determining the size of the file. And it doesn't matter whether you open the file in binary or text mode, the file size will be the same.
No. That will only determine the number of lines in the file, not the size of the file.must traverse the file to find out how many instances of '\r\n' are in the file to determine its text size.
Last edited by Ancient Dragon; 12-24-2005 at 06:27 AM.
thanx,it works.
The OP asked how to get the size of a text file as opposed to the size of a binary file. Assumably, we can take that to mean that he wishes to know how many characters can be read from the file with the text functions such as fgets, fgetc, etc.Originally Posted by Ancient Dragon
This differs from the binary size of a file on the Windows platform due to line-end translation. It can only be obtained by reading in the entire file.
This is demonstrated by a simple program:
Results:Code:#include <stdio.h> int main(void) { long bad_size = 0; long good_size = 0; FILE* hFile; hFile = fopen("filesize.c", "r"); /* This method will give incorrect text size. */ fseek(hFile, 0, SEEK_END); bad_size = ftell(hFile); /* Rewind file. */ fseek(hFile, 0, SEEK_SET); /* Now traverse the file to get the correct text size. */ /* Note, we can read in the entire file with fgets, fscanf, etc but * fgetc is simplest for this purpose. */ while (fgetc(hFile) != EOF) { good_size++; } printf("The bad size is %d.\n", bad_size); printf("The good size is %d.\n", good_size); fclose(hFile); getchar(); return 0; }
Thje good news is that the binary size of a file will typically be as large or larger than the text size*. Therefore, your method may be used to get a size for memory allocation for example before reading in the file.Code:The bad size is 687. The good size is 643.
* This is not guaranteed by the C standard. Charset conversion may mean that the binary size of a file is smaller than the text size on some platforms. Theoretically, a C library implementation could even compress text files on output and decompress on input.
depends on what you are calling "good size" -- In my opinion "file size" is found by what you call "bad size". now, if you using MS-windows os, open up a command prompt, cd to the directory that contans the file, and do "dir filesize.c <Enter>". That will give you the same size as your program said was "bad".
I don't know how compressed file systems will react, haven't used one.
Last edited by Ancient Dragon; 12-24-2005 at 09:56 AM.
True,bad_size reffered by anonytmouse is actual size recognised by any O.S.
Conversion of \n to \r\n during reading is a feature of standard library function and not that of O.S. Hence O.S treats \r and\n as seperate characters.
Following code proves this.
After right clicking and seeing properties I find same size as generated by above program.This proves my point.Code:#include<iostream.h> #include<conio.h> #include<fstream.h> int main() { clrscr(); ifstream file("c:\\windows\\desktop\\01548.txt"); if(!file) cout<<"File cannot be opened"; else { file.seekg(0,ios::end); cout<<file.tellg()<<"bytes"<<(float)file.tellg()/1024<<"K.B"; } getch(); return(0); }
It's not even as simple as that. Character size depends on more than end-of-line conventions. The encoding is another big part. In UTF-16, every character needs at least two bytes. In UTF-8, a character may be as large as five bytes. In Shift-JIS, some character runs are one byte each, some are two bytes.
To get the true character size of a text file, you have no choice but to decode it completely.
All the buzzt!
CornedBee
"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law
A VAX will make life interesting
I wouldn't call that interesting, exactly.
All the buzzt!
CornedBee
"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law
You could use GetFileSize in Win32: http://www.daniweb.com/techtalkforums/thread37272.html
dwk
Seek and ye shall find. quaere et invenies.
"Simplicity does not precede complexity, but follows it." -- Alan Perlis
"Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
"The only real mistake is the one from which we learn nothing." -- John Powell
Other boards: DaniWeb, TPS
Unofficial Wiki FAQ: cpwiki.sf.net
My website: http://dwks.theprogrammingsite.com/
Projects: codeform, xuni, atlantis, nort, etc.