I've tried using sizeof and that obviously doesn't work. I'm not sure how else to do it. Any ideas?
I've tried using sizeof and that obviously doesn't work. I'm not sure how else to do it. Any ideas?
Depends what you mean by 'size'.
- the number of lines
- the number of characters
- the number of bytes
A crude answer to the latter would be to fseek() to the end of the file, then do ftell().
Anything else means you need to read the file and count.
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper.
there are functions out there like stat or fstat that tell you things about a file. the only portable way is to read the file.
Code:#include <stdio.h> #include <stdlib.h> int main() { FILE *fp = fopen("file.txt", "r"); int c, sz = 0; while ((c = getc(fp)) != EOF) ++sz; printf("the file is %d bytes\n", sz); return EXIT_SUCCESS; }
That depends on which size you're actually interested in. Do you want the size in bytes, the number of characters in the file (this depends on the file's encoding), do you want carriage return/linefeed conversion?
Greets,
Philip
All things begin as source code.
Source code begins with an empty file.
-- Tao Te Chip
fseek() + ftell() is cerainly portable - however, it overestimates in some situations if you need to know how many characters the file contains - this is caused by the fact that in Windows, DOS and several other environment, newline in the file is actually two characters - so every line will account for one extra character that you will never see when you read the file. However, if what you need is "the file is no bigger than this" number, and you want it "quickly" - say to allocate space for the file with malloc(), then fseek() will be fine - it'll just give you a few percent above and beyond what you expected. But if there are a fair number of characters on each line, it will not amount to much.
Also, fread() should work OK, and reduce the call-overhead, if the file is really large (many megabytes), since fread will return the count of the number of bytes actually read.
If you want to count the number of characters including newline and also want to know number of lines, fgets() may work.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
This only works portably if the file is opened in binary mode, because of the braindead Windows end-of-line encoding.A crude answer to the latter would be to fseek() to the end of the file, then do ftell().
Reading the whole file character-by-character as proposed by Meldreth computes the number of characters, depending on encoding (think UTF-8) and mode (text vs. binary). This is probably not what you want.
Greets,
Philip
EDIT: look at stat()... this is portable in terms of POSIX and gives you the size in bytes
Last edited by Snafuist; 02-16-2009 at 11:52 AM.
All things begin as source code.
Source code begins with an empty file.
-- Tao Te Chip
on a text file you can't portably fseek to the end of a file without reading up to that point first and then calling ftell.Originally Posted by matsp
According to what? Have I missed something in the standard library, or something? [And whilst that may be strictly true - I do not know ALL of the standards documentation by heart, it certainly WORKS in all compilers I've ever used to do file-operations in - from Turbo C many years ago, through some Atari ST and Amiga 68K compilers and Windows compilers such as Visual Studio and gcc]
No, you won't get an ACCURATE value - so again, it goes back to what Salem stated - it depends on what purpose the size is, and what sort of size we are looking for. To allocate enough memoro to hold the file, or tell someone "this file is about 64K long", it's fine. Tell tell exactly we do indeed need to read the file from start to end.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
the standard that people on this board love so much.Originally Posted by matsp
For a text stream, either offset shall be zero, or offset shall be a value returned by
an earlier successful call to the ftell function on a stream associated with the same file
and whence shall be SEEK_SET.unless you've used all compilers that ever were, are, and will be, that doesn't mean a thing.Originally Posted by matsp
Again, in order for that to work portably and deliver correct results, the file needs to be opened in binary mode and the locale must be told that a character is exactly as long as a byte.Tell tell exactly we do indeed need to read the file from start to end.
Consider a text file with 10 UTF-8 characters, each of which has a size of 4 bytes. If the locale is set to UTF-8, we'll read 10 characters (and assume 10 bytes), but we're wrong by a factor of 4.
If we want to know for sure (and be efficient), we need to ask the operating system, e.g. by using stat().
Greets,
Philip
All things begin as source code.
Source code begins with an empty file.
-- Tao Te Chip
Eh? My copy of the standard says
It says nothing about text files not supporting SEEK_END. Maybe that's considered implied by binary streams not supporting, but I don't see anything explicit like that.Originally Posted by C99, 7.19.9.2
Edit: Now looking further, it does say that ftell does not have to return anything meaningful on text streams (i.e., it does not have to return a size in characters like it does for binary streams) so we can't use the idea anyway, so I retire from the field covered in shame.
Last edited by tabstop; 02-16-2009 at 01:05 PM.
wrong paragraph. read the next one, the one that talks about text files and says "and whence shall be SEEK_SET".Originally Posted by tabstop
Well, I read the commas as the other way: either (offset shall be zero), or (all the other stuff) as opposed to either (offset shall be zero) or (offset shall be previous), and (SEEK_SET). The rationale indicates that the second one was what was intended; so I obviously should have read that first. Mea culpa.