why is duplicate code dangerous?

**agentsmith** · 01-06-2008

Code:

 do{
	x = 0;
	do{
	a = fgetc (fp);
	if (a != ' ' && a != '\n' && 
		a != EOF && a !=',' && 
		a !=';' && a !='.'&& 
		a !='!' && a !='?' && 
		a !=':' && a !='`'){
	word[x] = tolower(a);
	  x++;
	  }else{
	if (x != 0){
		word[x] = '\0';
	    t = insertList (word, t);
		wordcount += 1;
		}
	}
      }while (a != ' ' && a != '\n' && 
			  a != EOF && a !=',' && 
			  a !=';' && a !='.'&& 
			  a !='!' && a !='?' && 
			  a !=':' && a !='`');
      
    } while (a != EOF);

here is my function that reads in characters from a .txt file, forms them into words and puts them in a list excluding characters i specified). i was told that this approach is dangerous for some reason and that i should only test characters once. why is that important and how can i rewrite the above loop? (i need the loop to place words in a list while excluding double spaces as well. i tried doing in with one test but double spaces were included into the list and messed up my word count). thanks

**Banana Man** · 01-06-2008

Duplicate code is dangerous because if you need to change it, you might forget to change all of the duplicates too. I'd fix your loop with a function.

Code:

int is_punctuation(int a)
{
    return a != ' ' && a != '\n' && 
           a != EOF && a !=',' && 
           a !=';' && a !='.'&& 
           a !='!' && a !='?' && 
           a !=':' && a !='`';
}

instead of doing the same test over and over, call the function.

Code:

do
{
    x = 0;
    do
    {
        a = fgetc (fp);
        if (is_punctuation(a))
        {
            word[x] = tolower(a);
            x++;
        }
        else
        {
            if (x != 0)
            {
                word[x] = '\0';
                t = insertList (word, t);
                wordcount += 1;
            }
        }
    }
    while (is_punctuation(a));
}
while (a != EOF);

I kind of guessed about the name for the function.

**agentsmith** · 01-06-2008

thanks i will try that.

**burnsd86** · 01-06-2008

under <ctype.h> isnt there a ispunc command?

**twomers** · 01-06-2008

edit: Never mind. I thought you were testing punctuation to ensure that tolower() didn't mess a , down.

Also. I don't think there's a need to check for punctuation:

Code:

int main( void ) {
  char array[] = "ABCDefG,.12345";
  int i;

  for ( i=0; i<sizeof array; i++ )
    printf( "&#37;c :: %c\n", array[i], tolower(array[i]) );
}

My output:

Code:

A :: a
B :: b
C :: c
D :: d
e :: e
f :: f
G :: g
, :: ,
. :: .
1 :: 1
2 :: 2
3 :: 3
4 :: 4
5 :: 5

Edit: Also is_punctuation should be called not_punctuation.

**matsp** · 01-07-2008

Duplicated code is "bad" in a few aspects:
1. You may end up changing only one of the two or more places that need changing.
2. It is sometimes hard to determine if the similar looking code is actually identical, or just nearly the same [and if it's actually supposed to be the same or not]. This is of course related to 1 - you may be looking for a bug, and finding that the code is "nearly identical", you may assume that it's supposed to be - but it's NOT. [This of course means that it should have a clear comment to say "This is not a duplicate of code (above, below, in file X), it is different in that ..."]
3. If the duplication is large enough, it adds to the code-space used in the executable, and thus makes the executable larger. This is one of those things that is difficult to balance - larger code MAY run faster because it's "inline", meaning that there is no jump to another function - or it may run slower because the caches are more used. In this particular case I would suggest a common function is better, because the processor will do better branch prediction based on the previous run of the same function.

--
Mats

**CornedBee** · 01-07-2008

In addition, the compiler has heuristics that may make it inline the code of the function anyway.

**matsp** · 01-07-2008

Originally Posted by CornedBee

In addition, the compiler has heuristics that may make it inline the code of the function anyway.

Yes, indeed. I should probably have added that the compiler has those. However, I'm not aware of any compiler that can do "function out of duplicated code", so if you duplicate a sufficiently large portion of code, the compiler will NOT be able to make the "don't inline this" optimization.

--
Mats

**esbo** · 01-07-2008

Originally Posted by agentsmith

Code:

 do{
	x = 0;
	do{
	a = fgetc (fp);
	if (a != ' ' && a != '\n' && 
		a != EOF && a !=',' && 
		a !=';' && a !='.'&& 
		a !='!' && a !='?' && 
		a !=':' && a !='`'){
	word[x] = tolower(a);
	  x++;
	  }else{
	if (x != 0){
		word[x] = '\0';
	    t = insertList (word, t);
		wordcount += 1;
		}
	}
      }while (a != ' ' && a != '\n' && 
			  a != EOF && a !=',' && 
			  a !=';' && a !='.'&& 
			  a !='!' && a !='?' && 
			  a !=':' && a !='`');
      
    } while (a != EOF);

here is my function that reads in characters from a .txt file, forms them into words and puts them in a list excluding characters i specified). i was told that this approach is dangerous for some reason and that i should only test characters once. why is that important and how can i rewrite the above loop? (i need the loop to place words in a list while excluding double spaces as well. i tried doing in with one test but double spaces were included into the list and messed up my word count). thanks

That looks a bit messy to me.
I would do it something like this, if I have understood the question correctly.
It's just the general idea and it has not been compiled or tested.
So don't blame me if it does not work or compile :O)
I would be surprised if it did!!
Well not that surprised ;O)

Code:

char word[MAX_NUMBER_OF_WORDS][MAX_WORD_LENGTH];

int w=0;
int  x=0;
while ( ( a = fgetc (fp)   ) != EOF){
       v= tolower(a);
       if (v>='a' && v<='z'){
            word[w,x++] = a;       
       }
        else{
	        w++; x=0;
	}
 }

**vart** · 01-07-2008

So don't blame me if it does not work or compile

But of course we will blame you. Somebody posting the example should AT LEAST learn the syntax of the language

**laserlight** · 01-07-2008

so if you duplicate a sufficiently large portion of code, the compiler will NOT be able to make the "don't inline this" optimization.

I am a little confused here. It seems to me that "if you duplicate a sufficiently large portion of code" by replacing them with calls of a function, the compiler will evaluate the situation as it being bad for inlining, so it will make the "don't inline this" optimisation (or to look at it another way: the compiler will not make the "inline this" optimisation), possibly even against the programmer's suggestion (e.g., with the inline keyword).

Off topic: I find it ironic that "agentsmith" (c.f. Matrix trilogy) is talking about duplicate code.

**esbo** · 01-08-2008

Originally Posted by vart

But of course we will blame you. Somebody posting the example should AT LEAST learn the syntax of the language

Not really, the compiler is better at that then me, it will inform me of any errors
when I compile it.

But then I expect all the code you write compiles first time because you are so clever???

**vart** · 01-08-2008

it will inform me of any errors

ha-ha

**esbo** · 01-08-2008

Originally Posted by vart

But of course we will blame you. Somebody posting the example should AT LEAST learn the syntax of the language

OK so I put
word[w,x++] = a;
Instead of
word[w][x++] = a;

Big deal!
Seems to compile nicely now, and no doubt work ;O)

Code:

int word[MAX_NUMBER_OF_WORDS][MAX_WORD_LENGTH];

int w=0;
int  x=0;
while ( ( a = fgetc (fp)   ) != EOF){
       v= tolower(a);
       if (v>='a' && v<='z'){
            word[w][x++] = a;       
       }
        else{
	        w++; x=0;
	}
 }

**vart** · 01-08-2008

compile? work? yeah... [sarcasm]

Thread: why is duplicate code dangerous?

Thread Tools

Search Thread

Display

why is duplicate code dangerous?

Similar Threads

Proposal: Code colouring

Updated sound engine code

Interface Question

Can you have nested code block? What does the compiler do? For example ...

Very dangerous code...