Thread: How can I tell the size of a text file?

  1. #1
    Registered User
    Join Date
    Jan 2009
    Posts
    31

    How can I tell the size of a text file?

    I've tried using sizeof and that obviously doesn't work. I'm not sure how else to do it. Any ideas?

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Depends what you mean by 'size'.
    - the number of lines
    - the number of characters
    - the number of bytes

    A crude answer to the latter would be to fseek() to the end of the file, then do ftell().
    Anything else means you need to read the file and count.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Feb 2009
    Posts
    138
    there are functions out there like stat or fstat that tell you things about a file. the only portable way is to read the file.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int main()
    {
        FILE *fp = fopen("file.txt", "r");
        int c, sz = 0;
        while ((c = getc(fp)) != EOF) ++sz;
        printf("the file is %d bytes\n", sz);
        return EXIT_SUCCESS;
    }

  4. #4
    Complete Beginner
    Join Date
    Feb 2009
    Posts
    312
    That depends on which size you're actually interested in. Do you want the size in bytes, the number of characters in the file (this depends on the file's encoding), do you want carriage return/linefeed conversion?

    Greets,
    Philip
    All things begin as source code.
    Source code begins with an empty file.
    -- Tao Te Chip

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Meldreth View Post
    there are functions out there like stat or fstat that tell you things about a file. the only portable way is to read the file.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int main()
    {
        FILE *fp = fopen("file.txt", "r");
        int c, sz = 0;
        while ((c = getc(fp)) != EOF) ++sz;
        printf("the file is %d bytes\n", sz);
        return EXIT_SUCCESS;
    }
    fseek() + ftell() is cerainly portable - however, it overestimates in some situations if you need to know how many characters the file contains - this is caused by the fact that in Windows, DOS and several other environment, newline in the file is actually two characters - so every line will account for one extra character that you will never see when you read the file. However, if what you need is "the file is no bigger than this" number, and you want it "quickly" - say to allocate space for the file with malloc(), then fseek() will be fine - it'll just give you a few percent above and beyond what you expected. But if there are a fair number of characters on each line, it will not amount to much.

    Also, fread() should work OK, and reduce the call-overhead, if the file is really large (many megabytes), since fread will return the count of the number of bytes actually read.

    If you want to count the number of characters including newline and also want to know number of lines, fgets() may work.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Complete Beginner
    Join Date
    Feb 2009
    Posts
    312
    A crude answer to the latter would be to fseek() to the end of the file, then do ftell().
    This only works portably if the file is opened in binary mode, because of the braindead Windows end-of-line encoding.

    Reading the whole file character-by-character as proposed by Meldreth computes the number of characters, depending on encoding (think UTF-8) and mode (text vs. binary). This is probably not what you want.

    Greets,
    Philip

    EDIT: look at stat()... this is portable in terms of POSIX and gives you the size in bytes
    Last edited by Snafuist; 02-16-2009 at 11:52 AM.
    All things begin as source code.
    Source code begins with an empty file.
    -- Tao Te Chip

  7. #7
    Registered User
    Join Date
    Feb 2009
    Posts
    138
    Quote Originally Posted by matsp
    fseek() + ftell() is cerainly portable
    on a text file you can't portably fseek to the end of a file without reading up to that point first and then calling ftell.

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Meldreth View Post
    on a text file you can't portably fseek to the end of a file without reading up to that point first and then calling ftell.
    According to what? Have I missed something in the standard library, or something? [And whilst that may be strictly true - I do not know ALL of the standards documentation by heart, it certainly WORKS in all compilers I've ever used to do file-operations in - from Turbo C many years ago, through some Atari ST and Amiga 68K compilers and Windows compilers such as Visual Studio and gcc]

    No, you won't get an ACCURATE value - so again, it goes back to what Salem stated - it depends on what purpose the size is, and what sort of size we are looking for. To allocate enough memoro to hold the file, or tell someone "this file is about 64K long", it's fine. Tell tell exactly we do indeed need to read the file from start to end.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Registered User
    Join Date
    Feb 2009
    Posts
    138
    Quote Originally Posted by matsp
    According to what?
    the standard that people on this board love so much.
    For a text stream, either offset shall be zero, or offset shall be a value returned by
    an earlier successful call to the ftell function on a stream associated with the same file
    and whence shall be SEEK_SET.
    Quote Originally Posted by matsp
    it certainly WORKS in all compilers I've ever used to do file-operations in
    unless you've used all compilers that ever were, are, and will be, that doesn't mean a thing.

  10. #10
    Complete Beginner
    Join Date
    Feb 2009
    Posts
    312
    Tell tell exactly we do indeed need to read the file from start to end.
    Again, in order for that to work portably and deliver correct results, the file needs to be opened in binary mode and the locale must be told that a character is exactly as long as a byte.

    Consider a text file with 10 UTF-8 characters, each of which has a size of 4 bytes. If the locale is set to UTF-8, we'll read 10 characters (and assume 10 bytes), but we're wrong by a factor of 4.

    If we want to know for sure (and be efficient), we need to ask the operating system, e.g. by using stat().

    Greets,
    Philip
    All things begin as source code.
    Source code begins with an empty file.
    -- Tao Te Chip

  11. #11
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Meldreth View Post
    the standard that people on this board love so much.
    So use offset = 0, whence = SEEK_END, and you're done. Voila!

  12. #12
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by tabstop View Post
    So use offset = 0, whence = SEEK_END, and you're done. Voila!
    Yes, but it says that you can't use SEEK_END with a text-file.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  13. #13
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by matsp View Post
    Yes, but it says that you can't use SEEK_END with a text-file.

    --
    Mats
    Eh? My copy of the standard says
    Quote Originally Posted by C99, 7.19.9.2
    A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.
    It says nothing about text files not supporting SEEK_END. Maybe that's considered implied by binary streams not supporting, but I don't see anything explicit like that.

    Edit: Now looking further, it does say that ftell does not have to return anything meaningful on text streams (i.e., it does not have to return a size in characters like it does for binary streams) so we can't use the idea anyway, so I retire from the field covered in shame.
    Last edited by tabstop; 02-16-2009 at 01:05 PM.

  14. #14
    Registered User
    Join Date
    Feb 2009
    Posts
    138
    Quote Originally Posted by tabstop
    It says nothing about text files not supporting SEEK_END.
    wrong paragraph. read the next one, the one that talks about text files and says "and whence shall be SEEK_SET".

  15. #15
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Meldreth View Post
    wrong paragraph. read the next one, the one that talks about text files and says "and whence shall be SEEK_SET".
    Well, I read the commas as the other way: either (offset shall be zero), or (all the other stuff) as opposed to either (offset shall be zero) or (offset shall be previous), and (SEEK_SET). The rationale indicates that the second one was what was intended; so I obviously should have read that first. Mea culpa.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. To find the memory leaks without using any tools
    By asadullah in forum C Programming
    Replies: 2
    Last Post: 05-12-2008, 07:54 AM
  2. struct question
    By caduardo21 in forum Windows Programming
    Replies: 5
    Last Post: 01-31-2005, 04:49 PM
  3. Read word from text file (It is an essay)
    By forfor in forum C Programming
    Replies: 7
    Last Post: 05-08-2003, 11:45 AM
  4. what does this mean to you?
    By pkananen in forum C++ Programming
    Replies: 8
    Last Post: 02-04-2002, 03:58 PM
  5. Outputting String arrays in windows
    By Xterria in forum Game Programming
    Replies: 11
    Last Post: 11-13-2001, 07:35 PM