Thread: count signs in .txt file bug?

  1. #1
    Registered User
    Join Date
    Jun 2021
    Posts
    2

    count signs in .txt file bug?

    Hello, I wrote a little program to count the signs inside a .txt file, it looks like this:

    Code:
    #include <stdio.h>
    
    long int countSigns(FILE * test)
    {
        char current = 0;
        
        long int pos1 = ftell(test);
        fseek(test, 0, SEEK_END);
        
        long int pos2 = ftell(test);
        fseek(test, pos1, SEEK_SET);
        
        do{
            fread(&current,1,sizeof(char),test);
            
            if(current == ' ' || current == '\n'){
                pos2--;
            }
            
        }while(!feof(test));
        
        return pos2;
    }
    
    int main()
    {
        FILE * test = fopen("test.txt", "r");
        
        printf("Zeichenzahl: %ld\n", countSigns(test));
        
        fclose(test);
    }
    the problem is: The program works partially: It doesn't count the ' ' signs, but for some reason, it does count all '\n' but the first one. I have no clue, why, maybe you can help.

    Thanks in advance!

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    38,711
    Your file reading loop is wrong, because feof() doesn't work how you expect.

    In particular, you have an fread() that will fail to read something, but you don't detect that.

    Code:
    while ( fread(&current,1,sizeof(char),test) == 1 ) {
        if(current == ' ' || current == '\n'){
            pos2--;
        }
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,345
    Does that actually work? It looks like the pos1 and pos2 business is just to count the number of characters in the file, then you subtract from pos2 each time... but that doesn't make sense to me. Why not just read and count the characters instead of trying some fancy tricks with number of characters and subtraction?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User
    Join Date
    Jun 2021
    Posts
    2
    Quote Originally Posted by Salem View Post
    Your file reading loop is wrong, because feof() doesn't work how you expect.

    In particular, you have an fread() that will fail to read something, but you don't detect that.

    Code:
    while ( fread(&current,1,sizeof(char),test) == 1 ) {
        if(current == ' ' || current == '\n'){
            pos2--;
        }
    }
    I tried that and it still does count the '\n', but I found something: If I do that:
    Code:
    do{
            fread(&current,1,sizeof(char),test);
            
            if(current == ' '){
                pos2--;
            }
            if(current == '\n'){
                pos2 -= 2;
            }
            
        }while(!feof(test));
    it actually works, so for some reason '\n' seems to be twice as large as the other chars???

  5. #5
    Registered User
    Join Date
    Dec 2017
    Posts
    1,136
    No, '\n' is not "twice as large".
    The problem is that you are reading a file with Windows line endings ('\r' '\n') in text mode, so your fread will never see the '\r' characters.
    However, the fseek and ftell have no way to ignore the '\r' chars, so they count them and therefore you need to subtract 2 for every '\n' you see.
    It would be more portable to open the file in binary mode and explicitly deal with the '\r' chars.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <ctype.h>
     
    long int countSigns(FILE *fin)
    {
        fseek(fin, 0, SEEK_END);
        long int pos = ftell(fin);
        fseek(fin, 0, SEEK_SET);
        for (char ch; fread(&ch, 1, 1, fin) == 1; )
            if (ch == ' ' || ch == '\n' || ch == '\r') // or isspace(ch)
                --pos;
        return pos;
    }
     
    // A more usual way to do it. (Also works fine in text mode.)
    long countNonWhitespace(FILE *fin) {
        long count = 0;
        for (int c; (c = fgetc(fin)) != EOF; )
            if (!isspace(c))
                ++count;
        return count;
    }
     
    int main()
    {
        FILE *fin = fopen("count.c", "rb"); // opening in binary mode
        if (!fin) { perror("fopen"); exit(EXIT_FAILURE); }
     
        printf("count: %ld\n", countNonWhitespace(fin));
        rewind(fin);
        printf("Zeichenzahl: %ld\n", countSigns(fin));
     
        fclose(fin);
        return 0;
    }
    Last edited by john.c; 06-19-2021 at 08:37 AM.
    The best argument against democracy is a five minute conversation with the average voter. - Churchill

  6. #6
    Registered User
    Join Date
    Sep 2020
    Posts
    80
    The problem is that you are reading a file with Windows line endings ('\r' '\n') in text mode, so your fread will never see the '\r' characters.
    Doesn't the C Runtime replace \r\n with a single \n or is this only in C++ ?

  7. #7
    Registered User
    Join Date
    Dec 2017
    Posts
    1,136
    Quote Originally Posted by thmm View Post
    Doesn't the C Runtime replace \r\n with a single \n or is this only in C++ ?
    That's the difference between "text" and "binary" modes in both C and C++.
    In text mode, the '\r' chars are removed.
    In binary mode they aren't.
    The OP's problem counting newlines had to do with opening the file in text mode (so '\r' is ignored) but counting the total characters in the file using essentially binary-mode operations that do not ignore the '\r' chars (fseek, ftell).
    The best argument against democracy is a five minute conversation with the average voter. - Churchill

  8. #8
    Registered User
    Join Date
    Sep 2020
    Posts
    80
    Thanks John, all clear now.

  9. #9
    Registered User
    Join Date
    Apr 2021
    Posts
    18
    Hm, I wanted to emphasize laserlight's point above; you do not want to compute the file size beforehand! Who knows what your function will accept as its argument? What if it receives STDIN, for example? In this case, you can't just "go to the end of the file and check what position you're at", because I can perfectly be typing my input slowly, taking my time. You should just count the characters you want by incrementing a counter instead of decrementing it.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 1
    Last Post: 03-10-2013, 09:47 PM
  2. Convert UTF signs
    By Lebod in forum C Programming
    Replies: 2
    Last Post: 01-18-2009, 11:56 AM
  3. '\r' signs misssing
    By Zahl in forum C++ Programming
    Replies: 5
    Last Post: 11-02-2002, 10:52 AM
  4. Signs in a file without using binary
    By Zahl in forum C++ Programming
    Replies: 6
    Last Post: 11-02-2002, 05:56 AM

Tags for this Thread