Thread: what would I use to count number of words

  1. #16
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    Quote Originally Posted by brewbuck View Post
    Wouldn't work. Take the simple rule of counting the spaces and then adding one. There is one space in "Two Words," 1+1 = 2, so there are 2 words. It's simple to add the rule "except when a space is preceded by another space." But if the input string is two spaces and nothing else, there is one space which is not preceded by a space, so there is 1 counted space. Add 1 to this according to the aforementioned rule, and you get the result that a string consisting only of 2 or more spaces contains 2 words.
    I stated later that what I was saying required various assumptions on the format of the input string to hold true.

    You are correct. It won't be correct in various circumstances.

  2. #17
    Registered User
    Join Date
    Apr 2007
    Posts
    45
    would you like me to post it somewhere else?

    another forum? :-S

  3. #18
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by s_jsstevens View Post
    would you like me to post it somewhere else?

    another forum? :-S
    How about in its own thread, for starters?

  4. #19
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by MacGyver View Post
    I stated later that what I was saying required various assumptions on the format of the input string to hold true.

    You are correct. It won't be correct in various circumstances.
    You could use it if you first ran the string through a function that lopped off any leading and trailing whitespace.

  5. #20
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by brewbuck View Post
    You don't need goto...

    Code:
    #include <ctype.h>
    
    #define IN_WORD 0
    #define IN_SPACE 1
    
    int count_words(const char *str)
    {
        int nwords = 0;
        int state;
    
        if(!*str) return 0;
        if(isspace(*str)) state = IN_SPACE;
        else
        {
            state = IN_WORD;
            nwords = 1;
        }
        while(*++str)
        {
            switch(state)
            {
            case IN_SPACE:
                if(!isspace(*str))
                {
                    state = IN_WORD;
                    nwords++;
                }
                break;
            case IN_WORD:
                if(isspace(*str)) state = IN_SPACE;
                break;
            }
        }
        return nwords;
    }
    goto is more efficient for state machines and usually, for state machines easier to read than the typical switch implementation. But you did convert it to Mealy instead of my original Moore machine.

    Code:
    #include <ctype.h>
    
    #define IN_WORD 0
    #define IN_SPACE 1
    
    int count_words(const char *str)
    {
        int nwords = 0;
       
    
        if(!*str) return 0;
        if(isspace(*str)) goto IN_SPACE;
        else
        {
           nwords=1 ;
            goto IN_WORD;
            
        }
        
      
       
           IN_SPACE:
                
                if (!*++str) return(nwords) ;
                if(isspace(*str)) goto IN_SPACE ;
                else
                {
                   
                    nwords++;
                    goto IN_WORD;
                }
                
            IN_WORD:
                if (!*++str) return(nwords) ;
                if(isspace(*str)) goto IN_SPACE;
                goto In_WORD ;
            }
      
        
    }

  6. #21
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by SevenThunders View Post
    goto is more efficient for state machines and usually, for state machines easier to read than the typical switch implementation. But you did convert it to Mealy instead of my original Moore machine.
    Now you're just spewing unsubstantiated bull. The compiler is smarter than you give it credit for. And whether it's easier to read is completely subjective. Did you rip this statement directly from a textbook from 1982 or something?

  7. #22
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by brewbuck View Post
    Now you're just spewing unsubstantiated bull. The compiler is smarter than you give it credit for. And whether it's easier to read is completely subjective. Did you rip this statement directly from a textbook from 1982 or something?

    The compiler may be smart, but it's still likely to implement it using a cascade of conditionals inside the switch loop. For a state machine of this size it may be able to reduce that into something equivalent. For a much larger state machine I doubt it can approach the efficiency of the go to statements. Goto's will be implemented directly using the jmp machine instruction in an x86 architecture. switch boils down to a bunch of if then and else which invariably boils down to a cascade of conditionals and then jump statements. A good compiler will be able to predict eventually where all those jumps lead to, but there are no guarantees. I'd put my money on the goto implementation for efficiency. If code size is a metric than the goto implementation of state machines is a bit easier to read as well. But again thats just my opinon and it would probably be frowned upon in most beginning programming courses.

  8. #23
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by SevenThunders View Post
    I'd put my money on the goto implementation for efficiency.
    I'll take that challenge. How much money are we talking here?

  9. #24
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,666
    > but it's still likely to implement it using a cascade of conditionals inside the switch loop.
    And how is that any worse than the cascade of if/else + goto's that you've written?

    Most of the compilers I've come across use jump tables when the cases are 'dense' enough, say for example consecutive integers assigned to enum constants. It's only when you have 'sparse' cases ( case 1000: case 20000: ) that you're likely to get an if/else cascade.

    Since it can't be any worse than the goto horror, you may as well use it.

    Code:
    typedef enum {
        inWord,
        inSpace
    } state_t;
    
    int main( int argc, char *argv[])
    {
        state_t state;
    
        switch ( state ) {
            case inWord:
                break;
            case inSpace:
                break;
            default:
                /* very useful diagnostic trap */
                /* say adding a new state without updating the code */
                break;
        }
    
        return 0;
    }


    Besides which, your code isn't even valid
    #define IN_WORD 0
    goto IN_WORD;
    results in
    error: parse error before numeric constant
    You can't have numeric labels, or arrays of labels (except in extended gcc), so any attempt to try and match your goto's with symbolic state names is going to be hard work at best.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #25
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by Salem View Post
    > but it's still likely to implement it using a cascade of conditionals inside the switch loop.
    And how is that any worse than the cascade of if/else + goto's that you've written?

    Most of the compilers I've come across use jump tables when the cases are 'dense' enough, say for example consecutive integers assigned to enum constants. It's only when you have 'sparse' cases ( case 1000: case 20000: ) that you're likely to get an if/else cascade.

    Since it can't be any worse than the goto horror, you may as well use it.

    Code:
    typedef enum {
        inWord,
        inSpace
    } state_t;
    
    int main( int argc, char *argv[])
    {
        state_t state;
    
        switch ( state ) {
            case inWord:
                break;
            case inSpace:
                break;
            default:
                /* very useful diagnostic trap */
                /* say adding a new state without updating the code */
                break;
        }
    
        return 0;
    }


    Besides which, your code isn't even valid
    #define IN_WORD 0
    goto IN_WORD;
    results in
    error: parse error before numeric constant
    You can't have numeric labels, or arrays of labels (except in extended gcc), so any attempt to try and match your goto's with symbolic state names is going to be hard work at best.
    Yup I forgot to scrub out the define statements and I didn't check to see if my code compiled sorry, but the idea isn't exactly rocket science. Jump tables are an extra indirection, though certainly an acceptable one. A naive implementation of switch will check every value in the switch argument until a match is found. Indeed there is little choice since the matches to the argument of the switch may not be small integers in consecutive order, easily packed in an array. This necessarily forces more comparisons than the direct implementation of the state machine via jumps, which does not have to check a state variable against many matches. That's because the state is inherently determined by the instruction pointer (position in the code).

    Many compilers that use C as a portable assembly language will use gotos for the improved efficiency. Finally the idea that goto is evil is primarily due to a series of articles starting with , Edsger Dijkstra in 1968. However goto definately has it's place in programming, and in my humble opinion the finite state machine is where goto just shines.

  11. #26
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by SevenThunders View Post
    Many compilers that use C as a portable assembly language will use gotos for the improved efficiency. Finally the idea that goto is evil is primarily due to a series of articles starting with , Edsger Dijkstra in 1968. However goto definately has it's place in programming, and in my humble opinion the finite state machine is where goto just shines.
    Considering that the switch statement was invented primarily for the implementation of state machines, I find this argument specious.

    As I said, what you are saying may have been true in 1982. But intelligent optimizing compilers are so widely available now that there's no excuse for using goto except in a very limited set of circumstances, and this isn't one of them. I'm baffled that you think the switch implementation looks "unclear."

  12. #27
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by brewbuck View Post
    Considering that the switch statement was invented primarily for the implementation of state machines, I find this argument specious.

    As I said, what you are saying may have been true in 1982. But intelligent optimizing compilers are so widely available now that there's no excuse for using goto except in a very limited set of circumstances, and this isn't one of them. I'm baffled that you think the switch implementation looks "unclear."
    That's news to me. I thought it was syntax sugar for if then else. I'll have to dig into the assembler code produced by some compilers for the switch statement, but I don't see how you can avoid doing a bunch of extra comparisons for the most general case, or at the very least require a layer of indirection and some kind of hashing scheme for the values of the switch argument. In general, if I say

    Code:
    switch(var) {
    case val1:
    ....
    break ;
    case val2:
    ....
    break;
    default:
    ...
    }
    How is the compiler going to know in advance that var ranges from say 0-4 so that I can implement the switch via a lookup table, especially if var is say an argument for a given function or is say initialized by the return from an external library? The answer is that the compiler can not know this and thus is must implement a comparison against val1, val2 .... etc until there is a match. Why do you think the c switch statement naturally falls through to the next case if there is no break?

    Answer: because it is implemented using a sequence of if then and else comparisons and letting it fall through is therefore very simple, nothing more than not including a jmp in assembly code to the statement after the end of the switch. I will grant you that if var can be guaranteed to be in a fixed range, then it is possible for the compiler to create more efficient code, but it will do a better job using the goto statements as should be clear. The gotos only perform the comparisons and branches required to determine the next state, whereas the switch must do this as well as the comparisons required to implement the switch itself.
    Last edited by SevenThunders; 04-27-2007 at 06:48 PM.

  13. #28
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by SevenThunders View Post
    Answer: because it is implemented using a sequence of if then and else comparisons and letting it fall through is therefore very simple, nothing more than not including a jmp in assembly code to the statement after the end of the switch. I will grant you that if var can be guaranteed to be in a fixed range, then it is possible for the compiler to create more efficient code, but it will do a better job using the goto statements as should be clear. The gotos only perform the comparisons and branches required to determine the next state, whereas the switch must do this as well as the comparisons required to implement the switch itself.
    This argument isn't even applicable to the case we're talking about. It is easily provable by the compiler that the state variable can only ever take on two different values. How else could it possibly change? It is only ever assigned the value IN_WORD or IN_SPACE. Only if the variable in the switch expression is passed from some other function, or computed by some hard-to-analyze expression, does the compiler have difficulty determining what its possible range can be. Even in that case, many compilers can perform cross-function (or even cross-module) data analysis.

    The methods for performing this kind of analysis have been known for almost 30 years. The reason older compilers didn't do it isn't because it's difficult, it's because they had pitifully small amounts of RAM and very slow clock cycles.

  14. #29
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    Quote Originally Posted by brewbuck View Post
    This argument isn't even applicable to the case we're talking about. It is easily provable by the compiler that the state variable can only ever take on two different values. How else could it possibly change? It is only ever assigned the value IN_WORD or IN_SPACE. Only if the variable in the switch expression is passed from some other function, or computed by some hard-to-analyze expression, does the compiler have difficulty determining what its possible range can be. Even in that case, many compilers can perform cross-function (or even cross-module) data analysis.

    The methods for performing this kind of analysis have been known for almost 30 years. The reason older compilers didn't do it isn't because it's difficult, it's because they had pitifully small amounts of RAM and very slow clock cycles.
    I never said that this trivial example was a good reason for this kind of optimization in C. However the general case can most certainly gain an advantage using gotos. Moreover C is kind of crappy for the kind of analysis you are talking about since it is weakly typed. The variable var may have been coerced from some other type, thus creating more uncertainty about the restrictions on the state variable. Also once you actually use an external function to help compute a value of state (an almost certainty given complex enough code) forget about it.

    So for sufficiently complex code, you have to assume the most general case. So one point to take out of this is that gotos aren't quite as evil as blindly taught in our colleges. I continue to maintain that state machines are a great reason to use the much maligned goto.

  15. #30
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by SevenThunders View Post
    The variable var may have been coerced from some other type, thus creating more uncertainty about the restrictions on the state variable.
    As I said, unless the variable was passed from another function, the compiler can determine whether this is the case.

    Also once you actually use an external function to help compute a value of state (an almost certainty given complex enough code) forget about it.
    So long as that other function is in the same source module, the compiler has no trouble figuring it out. Some compilers, like Intel's ICC compiler, can even perform such analysis across different translation units (.c files).

    So for sufficiently complex code, you have to assume the most general case.
    For sufficiently complex code, it is far more important for the code to be comprehensible than it is to be 100% "optimal." I've never encountered a case where goto caused previously unacceptably slow code to become acceptable. On the other hand, I have, many times, made simple algorithmic changes which sped up code by factors of 100 or more. And I've got several million lines of code under my belt.

    So one point to take out of this is that gotos aren't quite as evil as blindly taught in our colleges. I continue to maintain that state machines are a great reason to use the much maligned goto.
    Maintain it all you want. Whether you're correct or not, most people disagree. Being right isn't going to help you when you get into an argument. People get religious about this stuff.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. xor linked list
    By adramalech in forum C Programming
    Replies: 23
    Last Post: 10-14-2008, 10:13 AM
  2. Nim Trainer
    By guesst in forum Game Programming
    Replies: 3
    Last Post: 05-04-2008, 04:11 PM
  3. adding a number to a number
    By bigmac(rexdale) in forum C Programming
    Replies: 11
    Last Post: 10-24-2007, 12:56 PM
  4. Please Explain Count the number of bits in an int code
    By dnysveen in forum C++ Programming
    Replies: 36
    Last Post: 12-23-2006, 10:39 PM
  5. counting the number of words
    By ccoder01 in forum C Programming
    Replies: 3
    Last Post: 05-29-2004, 02:38 AM