How can I tell the size of a text file?

**sharrakor** · 02-16-2009

I've tried using sizeof and that obviously doesn't work. I'm not sure how else to do it. Any ideas?

**Salem** · 02-16-2009

Depends what you mean by 'size'.
- the number of lines
- the number of characters
- the number of bytes

A crude answer to the latter would be to fseek() to the end of the file, then do ftell().
Anything else means you need to read the file and count.

**Meldreth** · 02-16-2009

there are functions out there like stat or fstat that tell you things about a file. the only portable way is to read the file.

Code:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    FILE *fp = fopen("file.txt", "r");
    int c, sz = 0;
    while ((c = getc(fp)) != EOF) ++sz;
    printf("the file is %d bytes\n", sz);
    return EXIT_SUCCESS;
}

**Snafuist** · 02-16-2009

That depends on which size you're actually interested in. Do you want the size in bytes, the number of characters in the file (this depends on the file's encoding), do you want carriage return/linefeed conversion?

Greets,
Philip

**matsp** · 02-16-2009

Originally Posted by Meldreth

there are functions out there like stat or fstat that tell you things about a file. the only portable way is to read the file.

Code:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    FILE *fp = fopen("file.txt", "r");
    int c, sz = 0;
    while ((c = getc(fp)) != EOF) ++sz;
    printf("the file is %d bytes\n", sz);
    return EXIT_SUCCESS;
}

fseek() + ftell() is cerainly portable - however, it overestimates in some situations if you need to know how many characters the file contains - this is caused by the fact that in Windows, DOS and several other environment, newline in the file is actually two characters - so every line will account for one extra character that you will never see when you read the file. However, if what you need is "the file is no bigger than this" number, and you want it "quickly" - say to allocate space for the file with malloc(), then fseek() will be fine - it'll just give you a few percent above and beyond what you expected. But if there are a fair number of characters on each line, it will not amount to much.

Also, fread() should work OK, and reduce the call-overhead, if the file is really large (many megabytes), since fread will return the count of the number of bytes actually read.

If you want to count the number of characters including newline and also want to know number of lines, fgets() may work.

--
Mats

**Snafuist** · 02-16-2009

A crude answer to the latter would be to fseek() to the end of the file, then do ftell().

This only works portably if the file is opened in binary mode, because of the braindead Windows end-of-line encoding.

Reading the whole file character-by-character as proposed by Meldreth computes the number of characters, depending on encoding (think UTF-8) and mode (text vs. binary). This is probably not what you want.

Greets,
Philip

EDIT: look at stat()... this is portable in terms of POSIX and gives you the size in bytes

**Meldreth** · 02-16-2009

Originally Posted by matsp

fseek() + ftell() is cerainly portable

on a text file you can't portably fseek to the end of a file without reading up to that point first and then calling ftell.

**matsp** · 02-16-2009

Originally Posted by Meldreth

on a text file you can't portably fseek to the end of a file without reading up to that point first and then calling ftell.

According to what? Have I missed something in the standard library, or something? [And whilst that may be strictly true - I do not know ALL of the standards documentation by heart, it certainly WORKS in all compilers I've ever used to do file-operations in - from Turbo C many years ago, through some Atari ST and Amiga 68K compilers and Windows compilers such as Visual Studio and gcc]

No, you won't get an ACCURATE value - so again, it goes back to what Salem stated - it depends on what purpose the size is, and what sort of size we are looking for. To allocate enough memoro to hold the file, or tell someone "this file is about 64K long", it's fine. Tell tell exactly we do indeed need to read the file from start to end.

--
Mats

**Meldreth** · 02-16-2009

Originally Posted by matsp

According to what?

the standard that people on this board love so much.

For a text stream, either offset shall be zero, or offset shall be a value returned by
an earlier successful call to the ftell function on a stream associated with the same file
and whence shall be SEEK_SET.

Originally Posted by matsp

it certainly WORKS in all compilers I've ever used to do file-operations in

unless you've used all compilers that ever were, are, and will be, that doesn't mean a thing.

**Snafuist** · 02-16-2009

Tell tell exactly we do indeed need to read the file from start to end.

Again, in order for that to work portably and deliver correct results, the file needs to be opened in binary mode and the locale must be told that a character is exactly as long as a byte.

Consider a text file with 10 UTF-8 characters, each of which has a size of 4 bytes. If the locale is set to UTF-8, we'll read 10 characters (and assume 10 bytes), but we're wrong by a factor of 4.

If we want to know for sure (and be efficient), we need to ask the operating system, e.g. by using stat().

Greets,
Philip

**tabstop** · 02-16-2009

Originally Posted by Meldreth

the standard that people on this board love so much.

So use offset = 0, whence = SEEK_END, and you're done. Voila!

**matsp** · 02-16-2009

Originally Posted by tabstop

So use offset = 0, whence = SEEK_END, and you're done. Voila!

Yes, but it says that you can't use SEEK_END with a text-file.

--
Mats

**tabstop** · 02-16-2009

Originally Posted by matsp

Yes, but it says that you can't use SEEK_END with a text-file.

--
Mats

Eh? My copy of the standard says

Originally Posted by C99, 7.19.9.2

A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

It says nothing about text files not supporting SEEK_END. Maybe that's considered implied by binary streams not supporting, but I don't see anything explicit like that.

Edit: Now looking further, it does say that ftell does not have to return anything meaningful on text streams (i.e., it does not have to return a size in characters like it does for binary streams) so we can't use the idea anyway, so I retire from the field covered in shame.

**Meldreth** · 02-16-2009

Originally Posted by tabstop

It says nothing about text files not supporting SEEK_END.

wrong paragraph. read the next one, the one that talks about text files and says "and whence shall be SEEK_SET".

**tabstop** · 02-16-2009

Originally Posted by Meldreth

wrong paragraph. read the next one, the one that talks about text files and says "and whence shall be SEEK_SET".

Well, I read the commas as the other way: either (offset shall be zero), or (all the other stuff) as opposed to either (offset shall be zero) or (offset shall be previous), and (SEEK_SET). The rationale indicates that the second one was what was intended; so I obviously should have read that first. Mea culpa.

Thread: How can I tell the size of a text file?

Thread Tools

Search Thread

Display

How can I tell the size of a text file?

Similar Threads

To find the memory leaks without using any tools

struct question

Read word from text file (It is an essay)

what does this mean to you?

Outputting String arrays in windows