what would I use to count number of words

Quote:

Originally Posted by brewbuck

Wouldn't work. Take the simple rule of counting the spaces and then adding one. There is one space in "Two Words," 1+1 = 2, so there are 2 words. It's simple to add the rule "except when a space is preceded by another space." But if the input string is two spaces and nothing else, there is one space which is not preceded by a space, so there is 1 counted space. Add 1 to this according to the aforementioned rule, and you get the result that a string consisting only of 2 or more spaces contains 2 words.

I stated later that what I was saying required various assumptions on the format of the input string to hold true.

You are correct. It won't be correct in various circumstances.

would you like me to post it somewhere else?

another forum? :-S

Quote:

Originally Posted by s_jsstevens

would you like me to post it somewhere else?

another forum? :-S

How about in its own thread, for starters?

Quote:

Originally Posted by MacGyver

I stated later that what I was saying required various assumptions on the format of the input string to hold true.

You are correct. It won't be correct in various circumstances.

You could use it if you first ran the string through a function that lopped off any leading and trailing whitespace.

Quote:

Originally Posted by brewbuck

You don't need goto...

Code:

#include <ctype.h> #define IN_WORD 0 #define IN_SPACE 1 int count_words(const char *str) { int nwords = 0; int state; if(!*str) return 0; if(isspace(*str)) state = IN_SPACE; else { state = IN_WORD; nwords = 1; } while(*++str) { switch(state) { case IN_SPACE: if(!isspace(*str)) { state = IN_WORD; nwords++; } break; case IN_WORD: if(isspace(*str)) state = IN_SPACE; break; } } return nwords; }

goto is more efficient for state machines and usually, for state machines easier to read than the typical switch implementation. But you did convert it to Mealy instead of my original Moore machine.

Code:

#include <ctype.h> #define IN_WORD 0 #define IN_SPACE 1 int count_words(const char *str) { int nwords = 0; if(!*str) return 0; if(isspace(*str)) goto IN_SPACE; else { nwords=1 ; goto IN_WORD; } IN_SPACE: if (!*++str) return(nwords) ; if(isspace(*str)) goto IN_SPACE ; else { nwords++; goto IN_WORD; } IN_WORD: if (!*++str) return(nwords) ; if(isspace(*str)) goto IN_SPACE; goto In_WORD ; } }

Quote:

Originally Posted by SevenThunders

goto is more efficient for state machines and usually, for state machines easier to read than the typical switch implementation. But you did convert it to Mealy instead of my original Moore machine.

Now you're just spewing unsubstantiated bull. The compiler is smarter than you give it credit for. And whether it's easier to read is completely subjective. Did you rip this statement directly from a textbook from 1982 or something?

Quote:

Originally Posted by brewbuck

Now you're just spewing unsubstantiated bull. The compiler is smarter than you give it credit for. And whether it's easier to read is completely subjective. Did you rip this statement directly from a textbook from 1982 or something?

The compiler may be smart, but it's still likely to implement it using a cascade of conditionals inside the switch loop. For a state machine of this size it may be able to reduce that into something equivalent. For a much larger state machine I doubt it can approach the efficiency of the go to statements. Goto's will be implemented directly using the jmp machine instruction in an x86 architecture. switch boils down to a bunch of if then and else which invariably boils down to a cascade of conditionals and then jump statements. A good compiler will be able to predict eventually where all those jumps lead to, but there are no guarantees. I'd put my money on the goto implementation for efficiency. If code size is a metric than the goto implementation of state machines is a bit easier to read as well. But again thats just my opinon and it would probably be frowned upon in most beginning programming courses.

Quote:

Originally Posted by SevenThunders

I'd put my money on the goto implementation for efficiency.

I'll take that challenge. How much money are we talking here?

> but it's still likely to implement it using a cascade of conditionals inside the switch loop.
And how is that any worse than the cascade of if/else + goto's that you've written?

Most of the compilers I've come across use jump tables when the cases are 'dense' enough, say for example consecutive integers assigned to enum constants. It's only when you have 'sparse' cases ( case 1000: case 20000: ) that you're likely to get an if/else cascade.

Since it can't be any worse than the goto horror, you may as well use it.

Code:

typedef enum { inWord, inSpace } state_t; int main( int argc, char *argv[]) { state_t state; switch ( state ) { case inWord: break; case inSpace: break; default: /* very useful diagnostic trap */ /* say adding a new state without updating the code */ break; } return 0; }

Besides which, your code isn't even valid
#define IN_WORD 0
goto IN_WORD;
results in
error: parse error before numeric constant
You can't have numeric labels, or arrays of labels (except in extended gcc), so any attempt to try and match your goto's with symbolic state names is going to be hard work at best.

Quote:

Originally Posted by Salem

> but it's still likely to implement it using a cascade of conditionals inside the switch loop.
And how is that any worse than the cascade of if/else + goto's that you've written?

Most of the compilers I've come across use jump tables when the cases are 'dense' enough, say for example consecutive integers assigned to enum constants. It's only when you have 'sparse' cases ( case 1000: case 20000: ) that you're likely to get an if/else cascade.

Since it can't be any worse than the goto horror, you may as well use it.

Code:

typedef enum { inWord, inSpace } state_t; int main( int argc, char *argv[]) { state_t state; switch ( state ) { case inWord: break; case inSpace: break; default: /* very useful diagnostic trap */ /* say adding a new state without updating the code */ break; } return 0; }

Besides which, your code isn't even valid
#define IN_WORD 0
goto IN_WORD;
results in
error: parse error before numeric constant
You can't have numeric labels, or arrays of labels (except in extended gcc), so any attempt to try and match your goto's with symbolic state names is going to be hard work at best.

Yup I forgot to scrub out the define statements and I didn't check to see if my code compiled sorry, but the idea isn't exactly rocket science. Jump tables are an extra indirection, though certainly an acceptable one. A naive implementation of switch will check every value in the switch argument until a match is found. Indeed there is little choice since the matches to the argument of the switch may not be small integers in consecutive order, easily packed in an array. This necessarily forces more comparisons than the direct implementation of the state machine via jumps, which does not have to check a state variable against many matches. That's because the state is inherently determined by the instruction pointer (position in the code).

Many compilers that use C as a portable assembly language will use gotos for the improved efficiency. Finally the idea that goto is evil is primarily due to a series of articles starting with , Edsger Dijkstra in 1968. However goto definately has it's place in programming, and in my humble opinion the finite state machine is where goto just shines.

Quote:

Originally Posted by SevenThunders

Many compilers that use C as a portable assembly language will use gotos for the improved efficiency. Finally the idea that goto is evil is primarily due to a series of articles starting with , Edsger Dijkstra in 1968. However goto definately has it's place in programming, and in my humble opinion the finite state machine is where goto just shines.

Considering that the switch statement was invented primarily for the implementation of state machines, I find this argument specious.

As I said, what you are saying may have been true in 1982. But intelligent optimizing compilers are so widely available now that there's no excuse for using goto except in a very limited set of circumstances, and this isn't one of them. I'm baffled that you think the switch implementation looks "unclear."

Quote:

Originally Posted by brewbuck

Considering that the switch statement was invented primarily for the implementation of state machines, I find this argument specious.

As I said, what you are saying may have been true in 1982. But intelligent optimizing compilers are so widely available now that there's no excuse for using goto except in a very limited set of circumstances, and this isn't one of them. I'm baffled that you think the switch implementation looks "unclear."

That's news to me. I thought it was syntax sugar for if then else. I'll have to dig into the assembler code produced by some compilers for the switch statement, but I don't see how you can avoid doing a bunch of extra comparisons for the most general case, or at the very least require a layer of indirection and some kind of hashing scheme for the values of the switch argument. In general, if I say

Code:

switch(var) { case val1: .... break ; case val2: .... break; default: ... }

How is the compiler going to know in advance that var ranges from say 0-4 so that I can implement the switch via a lookup table, especially if var is say an argument for a given function or is say initialized by the return from an external library? The answer is that the compiler can not know this and thus is must implement a comparison against val1, val2 .... etc until there is a match. Why do you think the c switch statement naturally falls through to the next case if there is no break?

Answer: because it is implemented using a sequence of if then and else comparisons and letting it fall through is therefore very simple, nothing more than not including a jmp in assembly code to the statement after the end of the switch. I will grant you that if var can be guaranteed to be in a fixed range, then it is possible for the compiler to create more efficient code, but it will do a better job using the goto statements as should be clear. The gotos only perform the comparisons and branches required to determine the next state, whereas the switch must do this as well as the comparisons required to implement the switch itself.

Quote:

Originally Posted by SevenThunders

Answer: because it is implemented using a sequence of if then and else comparisons and letting it fall through is therefore very simple, nothing more than not including a jmp in assembly code to the statement after the end of the switch. I will grant you that if var can be guaranteed to be in a fixed range, then it is possible for the compiler to create more efficient code, but it will do a better job using the goto statements as should be clear. The gotos only perform the comparisons and branches required to determine the next state, whereas the switch must do this as well as the comparisons required to implement the switch itself.

This argument isn't even applicable to the case we're talking about. It is easily provable by the compiler that the state variable can only ever take on two different values. How else could it possibly change? It is only ever assigned the value IN_WORD or IN_SPACE. Only if the variable in the switch expression is passed from some other function, or computed by some hard-to-analyze expression, does the compiler have difficulty determining what its possible range can be. Even in that case, many compilers can perform cross-function (or even cross-module) data analysis.

The methods for performing this kind of analysis have been known for almost 30 years. The reason older compilers didn't do it isn't because it's difficult, it's because they had pitifully small amounts of RAM and very slow clock cycles.

Quote:

Originally Posted by brewbuck

This argument isn't even applicable to the case we're talking about. It is easily provable by the compiler that the state variable can only ever take on two different values. How else could it possibly change? It is only ever assigned the value IN_WORD or IN_SPACE. Only if the variable in the switch expression is passed from some other function, or computed by some hard-to-analyze expression, does the compiler have difficulty determining what its possible range can be. Even in that case, many compilers can perform cross-function (or even cross-module) data analysis.

The methods for performing this kind of analysis have been known for almost 30 years. The reason older compilers didn't do it isn't because it's difficult, it's because they had pitifully small amounts of RAM and very slow clock cycles.

I never said that this trivial example was a good reason for this kind of optimization in C. However the general case can most certainly gain an advantage using gotos. Moreover C is kind of crappy for the kind of analysis you are talking about since it is weakly typed. The variable var may have been coerced from some other type, thus creating more uncertainty about the restrictions on the state variable. Also once you actually use an external function to help compute a value of state (an almost certainty given complex enough code) forget about it.

So for sufficiently complex code, you have to assume the most general case. So one point to take out of this is that gotos aren't quite as evil as blindly taught in our colleges. I continue to maintain that state machines are a great reason to use the much maligned goto.

Quote:

Originally Posted by SevenThunders

The variable var may have been coerced from some other type, thus creating more uncertainty about the restrictions on the state variable.

As I said, unless the variable was passed from another function, the compiler can determine whether this is the case.

Quote:

Also once you actually use an external function to help compute a value of state (an almost certainty given complex enough code) forget about it.

So long as that other function is in the same source module, the compiler has no trouble figuring it out. Some compilers, like Intel's ICC compiler, can even perform such analysis across different translation units (.c files).

Quote:

So for sufficiently complex code, you have to assume the most general case.

For sufficiently complex code, it is far more important for the code to be comprehensible than it is to be 100% "optimal." I've never encountered a case where goto caused previously unacceptably slow code to become acceptable. On the other hand, I have, many times, made simple algorithmic changes which sped up code by factors of 100 or more. And I've got several million lines of code under my belt.

Quote:

So one point to take out of this is that gotos aren't quite as evil as blindly taught in our colleges. I continue to maintain that state machines are a great reason to use the much maligned goto.

Maintain it all you want. Whether you're correct or not, most people disagree. Being right isn't going to help you when you get into an argument. People get religious about this stuff.