Thread: Understanding the word count program from K&R book

  1. #1
    Registered User
    Join Date
    Apr 2011
    Posts
    55

    Understanding the word count program from K&R book

    I have started learning C programming using the classic C book by Ritchie & couple of other books as a reference. While going through solved examples I came across this program which counts new lines, characters & words. I understood new line and character part without any issues (obviously). However the logic to count words is confusing me. Can any one please help me in deciphering the code (in blue)?

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #define IN 1 /* inside a word */
    #define OUT 0 /* outside a word */
    /* count lines, words, and characters in input */
    int main(void) {
    int c, nl, nw, nc, state;
    state = OUT;
    nl = nw = nc = 0;
    while ((c = getchar()) != EOF) {
    ++nc;
    if (c == '\n')
    ++nl;
    if (c == ' ' || c == '\n' || c == '\t')
    state = OUT;
    else if (state == OUT) {
    state = IN;
    ++nw;
    }
    }
    printf("line:%d word:%d Character:%d\n", nl, nw, nc);
    return(0);
    }
    Last edited by alter.ego; 05-19-2011 at 04:33 AM.

  2. #2
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    It's keeping track of two states, in a word and out of a word. As soon as a white space character is found state is set to out.

    If the character is not white space and state is out, then we are in a word, state is set to in, and the word count variable is incremented.

  3. #3
    Registered User
    Join Date
    Apr 2011
    Posts
    55
    Quote Originally Posted by Subsonics View Post
    If the character is not white space and state is out, then we are in a word, state is set to in, and the word count variable is incremented.
    Ummm...thanks I understood that if the stream of characters contains either ' ', \n or \t then obviously it cannot be word & state is set to 0 (out). How is it establishing that a stream of characters contains no whitespace? Can you please explain the highlighted text in greater detail?

    What is the role of this piece of code?

    Code:
    else if (state == OUT)
    I am sorry, I know you've tried to explain me but I still have some doubts lingering!

  4. #4
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    White space is the stuff between words, and yes there are more white space characters than tab, space and new line but that is the basic idea.

    The whole point of using this approach is to count only once per word. You asked about the check:

    Code:
    else if (state == OUT)
    Well, that makes sure you can not take that code path more than once until a new white space is encountered. This, since state is set to IN, inside that block.

  5. #5
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by alter.ego View Post
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #define IN 1 /* inside a word */
    #define OUT 0 /* outside a word */
    /* count lines, words, and characters in input */
    int main(void) {
    int c, nl, nw, nc, state;
    state = OUT;
    nl = nw = nc = 0;
    while ((c = getchar()) != EOF) {
    ++nc;
    if (c == '\n')
    ++nl;
    if (c == ' ' || c == '\n' || c == '\t')
    state = OUT;
    else if (state == OUT) {
    state = IN;
    ++nw;
    }
    }
    printf("line:%d word:%d Character:%d\n", nl, nw, nc);
    return(0);
    }
    This is one case where text formatting helps a lot... if you indent and comment your code properly you can almost see what it's doing...

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    #define IN 1       // parsing a word
    #define OUT 0    // not parsing a word
    
    int main(void) 
      {  int c = 0;       // character from keyboard
         int nl = 0;      // new line counter
         int nw = 0;      // number of words
         int nc = 0;      // number of characters
         int state = OUT; // initially not inside a word
    
        while ((c = getchar()) != EOF) 
           {  
               ++nc;         // count characters     
    
              if (c == '\n')   // count lines
                ++nl;
    
              if (c == ' ' || c == '\n' || c == '\t')
                state = OUT;   // whitespace found
              else if (state == OUT) 
                {
                   state = IN;  // enter a word
                   ++nw;        // count words
                }
          }
    
       // display summary after CTRL-Z
       printf("line:%d word:%d Character:%d\n", nl, nw, nc);
       return(0);
    }

  6. #6
    Registered User
    Join Date
    Apr 2011
    Posts
    55
    Thanks for taking out time & helping me out guys! Those last few lines in the code are still bothering me. In the meanwhile, I used a twisted logic (no. of words = blank spaces + 1) to create a much simpler program.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    /* #define IN 1 inside a word */
    /* #define OUT 0 outside a word */
    /* count lines, words, and characters in input */
    int main(void) {
    int c, nl, nw, nc, nb, state;
    /*state = OUT;*/
    nl = nw = nc = nb = 0;
    while ((c = getchar()) != EOF) {
    ++nc;
    if (c == '\n')
    ++nl;
    if (c == ' ')
    ++nb;
    /*if (c == ' ' || c == '\n' || c == '\t')
    state = OUT;
    else if (state == OUT)
    {
    
    	state = IN;
    	++nw;
    
    }*/
    }
    printf("line:%d word:%d Character:%d\n", nl, nb+1, nc);
    return(0);
    }
    Its know it is silly and will fail if someone presses space-bar twice (among other scenarios) but at least it works in 'normal' circumstances.

  7. #7
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Good stuff... now all you've got to learn is how to indent your code properly and toss in a few comments...

    Think of it this way... You're working on a relatively complex project... you write it like you did above... if you come back to it in 5 years are you still going to be able to follow it?

    You should cultivate Good habits early... rather than fixing things up after the bad ones, later.

  8. #8
    Registered User
    Join Date
    Apr 2011
    Posts
    55
    Will def. cultivate this habit. Clumsiness of real life will not spill here . Thanks!
    Last edited by alter.ego; 05-20-2011 at 03:18 AM.

  9. #9
    Registered User
    Join Date
    Apr 2011
    Posts
    55
    Code:
    else if (state == OUT) {
    state = IN;
    ++nw;
    Apologies to dig this old thread but another challenge brought me back. How does state = IN; signifies that now the control is in the middle of a word and that it should count it?

    I mean when variable c encounters any wildchars is easier to determine that its not a word (its outside the word) but how does vice versa happens? I want to understand this aspect of programming wherein variables are used to keep track of states within the program. Any help will be greatly appreciated.

  10. #10
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Ok, here's a dead simple example that does essentially the same thing as the IN and OUT in your exercise... but with a different purpose.
    Code:
    #define On 1
    #define Off 0
    
    int IsOn = Off;
    
    // bunches of code...
    
    if (IsOn)  // test if the bulb is on
      { IsOn = Off;                // flag it as off
         TurnOffLamp(); }       // turn it off
    else
      { IsOn = On;                // flag it as on
         TurnOnLamp(); }       // turn it on
    You will find flag variables of this kind are used quite often. The flag merely directs program flow... in this case to toggle a light bulb on and off ...

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > while ((c = getchar()) != EOF)
    Well ordinarily, state should be OUT when you exit this loop.
    If the state it still IN, then perhaps you need to adjust the count in some way?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User
    Join Date
    Jun 2011
    Posts
    30

    Red face

    Quote Originally Posted by CommonTater View Post
    Code:
     
              else if (state == OUT) 
                {
                   state = IN;  // enter a word
                   ++nw;        // count words
                }
    It says that if none of the cases above are true, then state is actually 'IN' a word.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Unique Word count program
    By vjefcoat in forum C Programming
    Replies: 3
    Last Post: 11-22-2010, 06:51 PM
  2. Word count program
    By Jpeg6 in forum C Programming
    Replies: 1
    Last Post: 10-18-2010, 10:34 PM
  3. word count program not working
    By vsovereign in forum C Programming
    Replies: 5
    Last Post: 06-04-2010, 02:11 PM
  4. Replies: 10
    Last Post: 11-18-2008, 11:52 PM
  5. word count program need a bit of help!
    By Unregistered in forum C Programming
    Replies: 7
    Last Post: 04-19-2002, 08:15 PM