Thread: strange behaviour

  1. #1
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242

    strange behaviour

    Hi all.

    Here is my program to open a text file and to count the instances of each character. The output shows the correct information, but also displays extra information for which I can't explain.

    What is =1 at the bottom of the output? Sometimes it(or another number) appears twice half way through the output, instead of at the bottom. Thank you.

    Code:
    // charactercount.c: Count the number of instances of each character in a text file.
    
    #include <stdio.h>
    #include <stdlib.h>
    
    #define MAX_FILENAME_LENGTH 30
    
    struct Character    // structure to contain character, and the number of instances of character
    {
        char ch;
        int counter;    //number of instances of ch
    };
    
    int main(void)
    {
        struct Character list[256]; // ascii character set
        int count, ch;
        char filename[MAX_FILENAME_LENGTH];
        FILE *openfile;
    
        puts("Enter file to open");
    
        fgets(filename, MAX_FILENAME_LENGTH, stdin);
    
        filename[strlen(filename) - 1] = '\0';
    
        if((openfile = fopen(filename, "r")) != NULL)
            printf("File opened successfully\n\n");
    
        else
        {
            perror("Error");
            exit(1);
        }
    
        for(count = 0; count < 256; count++)    // initialise ch and counter to 0
        {
            list[count].ch = 48;    // 48 is ascii code for character 0
            list[count].counter = 0;
        }
    
        while((ch = fgetc(openfile)) != EOF)
        {
            for(count = 0; count < 256; count++)    // compare ch to values in array, if it already exists
                if(ch == list[count].ch)            // then increment it's counter
                {
                    list[count].counter++;
                    break;
                }
    
                else
                {                                            // if ch does not exist(because it's counter is 0), then assign
                    if(list[count].counter == 0)        // it to the array and increment it's counter
                    {
                        list[count].ch = ch;
                        list[count].counter++;
                        break;
                    }
                }
        }
    
        for(count = 0; count < 256; count++)    // print values of structures that have a character stored
        {
            if(list[count].counter != 0)
                printf("%c = %d\n", list[count].ch, list[count].counter);
            else
                break;
        }
    
        fclose(openfile);
    
        return 0;
    }
    Sample output:
    Code:
    Enter file to open
    test1.txt
    File opened successfully
    
    e = 6
    m = 1
    a = 4
    i = 7
    l = 3
      = 8
    n = 3
    o = 3
    t = 3
    f = 1
    c = 2
    r = 2
    v = 1
    d = 2
    : = 1
    b = 3
    u = 1
    y = 3
    w = 1
    p = 2
    k = 1
    s = 1
    
    = 1

  2. #2
    Registered User
    Join Date
    Dec 2011
    Posts
    795
    Looks like newline... try printing out the integer value of all the characters to confirm this ('\n' should be 10).

  3. #3
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242
    Yep, it's a newline. Thanks.

    But look at this output, Why is 13 happening twice in the middle, and only one '10' printed?

    I gotta go, so thanks in advance. Will check back later.

    Code:
    Enter file to open
    test.txt
    File opened successfully
    
    101 e = 138
    109 m = 24
    97 a = 96
    105 i = 70
    108 l = 46
    32   = 237
    110 n = 76
    111 o = 74
    116 t = 89
    102 f = 13
    99 c = 37
    114 r = 44
    118 v = 11
    100 d = 27
    58 : = 6
    98 b = 22
    117 u = 45
    121 y = 46
    119 w = 17
    112 p = 24
    107 k = 12
    115 s = 57
     = 13
    10 
     = 13
    72 H = 3
    104 h = 42
    46 . = 14
    41 ) = 4
    73 I = 5
    103 g = 6
    87 W = 3
    44 , = 12
    80 P = 10
    51 3 = 1
    113 q = 8
    67 C = 7
    53 5 = 2
    45 - = 3
    54 6 = 2
    83 S = 1
    68 D = 1
    79 O = 1

  4. #4
    Registered User
    Join Date
    Dec 2011
    Posts
    795
    It's a carriage return, a result of Windows' CRLF encoding for lines (after every new line there is \r\n).

    If you're unsure of what a number is, you can always look at an ASCII table.

  5. #5
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242
    I know 10 is for newline, but there is no number before the first "= 13". The 13 after the = represents the number of instances of a blank. Carriage return has decimal code 13, so shouldn't the output look something like "13 = 13"? The "10 = 13" I know means newline? I just want to know why there is a blank where a decimal code should be, because that is what is confusing me. Am I to assume that a CRLF has no decimal code? Where would I find it in the ascii table? Thank you.

    EDIT: And I noticed that you mentioned Windows, but I am using Linux and gcc is my compiler. I don't know if that matters or not.
    Last edited by cfanatic; 07-13-2012 at 09:50 PM.

  6. #6
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    945
    memcpy is right. It's a carriage return character. When it's printed to the terminal it moves the cursor to the start of the line ("returns" the "carriage", a term from mechanical typewriters), and the rest of the line overwrites the 13 that was printed first.

  7. #7
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    A carriage return will move the cursor to the beginning of the line, but not move to the next line. The text typed after it will overwrite what was there before on the terminal, in this case the number.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  8. #8
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Character set is what matters, not compiler as such (as long as your compiler supports the ASCII character set).

    Printing out a character with value 13 using the %c format prints a newline character. It does not print the digits 1 and 3. Similar things apply to several other characters. For example, other whitespace (return which has value 10, tab characters), audible bell (value 7), etc.

    The return character does not do a newline. It just moves the cursor to the start of the line. Subsequent output will overwrite what's on that line.

    If you want the digits print out, convert the character to an int, and use the %d format. Not the %c format.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  9. #9
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    945
    Fun fact: *nix systems automatically convert a line feed character to line feed plus carriage return characters when printing text to a terminal (this is the default behavior, anyway). Line feed (ASCII 10) moves the cursor down and carriage return (ASCII 13) moves the cursor to the start of that line. This is why "Unix" text files have only line feed characters at the end of each line. The MS-DOS terminal is basically a dumb terminal so text "MS-DOS" text files have CR LF at the end of each line. Now we're stuck with this dumbness in the Windows world and still have some text editors that can't even open "Unix" text files properly (eg, Notepad).

    End of rant.

  10. #10
    Registered User
    Join Date
    Jul 2012
    Location
    Australia
    Posts
    242
    Thanks for the replies. Much clearer now.
    IDE: Code::Blocks | Compiler Suite for Windows: TDM-GCC (MingW, gdb)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Strange sizeof() behaviour
    By Phenom in forum C++ Programming
    Replies: 6
    Last Post: 02-09-2011, 02:40 AM
  2. Strange behaviour of GCC
    By BlackOps in forum C Programming
    Replies: 14
    Last Post: 07-29-2009, 06:44 PM
  3. strange behaviour.......
    By surdy in forum C Programming
    Replies: 2
    Last Post: 05-01-2004, 11:50 AM
  4. GetClientRect strange behaviour
    By btq in forum Windows Programming
    Replies: 2
    Last Post: 10-02-2002, 02:13 PM
  5. Strange behaviour
    By PrivatePanic in forum Windows Programming
    Replies: 11
    Last Post: 07-23-2002, 12:54 AM