why is duplicate code dangerous?

This is a discussion on why is duplicate code dangerous? within the C Programming forums, part of the General Programming Boards category; Code: do{ x = 0; do{ a = fgetc (fp); if (a != ' ' && a != '\n' && ...

  1. #1
    Registered User
    Join Date
    Dec 2007
    Posts
    67

    why is duplicate code dangerous?

    Code:
     do{
    	x = 0;
    	do{
    	a = fgetc (fp);
    	if (a != ' ' && a != '\n' && 
    		a != EOF && a !=',' && 
    		a !=';' && a !='.'&& 
    		a !='!' && a !='?' && 
    		a !=':' && a !='`'){
    	word[x] = tolower(a);
    	  x++;
    	  }else{
    	if (x != 0){
    		word[x] = '\0';
    	    t = insertList (word, t);
    		wordcount += 1;
    		}
    	}
          }while (a != ' ' && a != '\n' && 
    			  a != EOF && a !=',' && 
    			  a !=';' && a !='.'&& 
    			  a !='!' && a !='?' && 
    			  a !=':' && a !='`');
          
        } while (a != EOF);
    here is my function that reads in characters from a .txt file, forms them into words and puts them in a list excluding characters i specified). i was told that this approach is dangerous for some reason and that i should only test characters once. why is that important and how can i rewrite the above loop? (i need the loop to place words in a list while excluding double spaces as well. i tried doing in with one test but double spaces were included into the list and messed up my word count). thanks

  2. #2
    Registered User
    Join Date
    Jan 2008
    Posts
    58
    Duplicate code is dangerous because if you need to change it, you might forget to change all of the duplicates too. I'd fix your loop with a function.
    Code:
    int is_punctuation(int a)
    {
        return a != ' ' && a != '\n' && 
               a != EOF && a !=',' && 
               a !=';' && a !='.'&& 
               a !='!' && a !='?' && 
               a !=':' && a !='`';
    }
    instead of doing the same test over and over, call the function.
    Code:
    do
    {
        x = 0;
        do
        {
            a = fgetc (fp);
            if (is_punctuation(a))
            {
                word[x] = tolower(a);
                x++;
            }
            else
            {
                if (x != 0)
                {
                    word[x] = '\0';
                    t = insertList (word, t);
                    wordcount += 1;
                }
            }
        }
        while (is_punctuation(a));
    }
    while (a != EOF);
    I kind of guessed about the name for the function.

  3. #3
    Registered User
    Join Date
    Dec 2007
    Posts
    67
    thanks i will try that.

  4. #4
    Registered User
    Join Date
    Jan 2008
    Posts
    1
    under <ctype.h> isnt there a ispunc command?

  5. #5
    The superhaterodyne twomers's Avatar
    Join Date
    Dec 2005
    Location
    Ireland
    Posts
    2,262
    edit: Never mind. I thought you were testing punctuation to ensure that tolower() didn't mess a , down.

    Also. I don't think there's a need to check for punctuation:
    Code:
    int main( void ) {
      char array[] = "ABCDefG,.12345";
      int i;
    
      for ( i=0; i<sizeof array; i++ )
        printf( "&#37;c :: %c\n", array[i], tolower(array[i]) );
    }
    My output:
    Code:
    A :: a
    B :: b
    C :: c
    D :: d
    e :: e
    f :: f
    G :: g
    , :: ,
    . :: .
    1 :: 1
    2 :: 2
    3 :: 3
    4 :: 4
    5 :: 5
    Edit: Also is_punctuation should be called not_punctuation.
    Last edited by twomers; 01-06-2008 at 07:22 PM.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Duplicated code is "bad" in a few aspects:
    1. You may end up changing only one of the two or more places that need changing.
    2. It is sometimes hard to determine if the similar looking code is actually identical, or just nearly the same [and if it's actually supposed to be the same or not]. This is of course related to 1 - you may be looking for a bug, and finding that the code is "nearly identical", you may assume that it's supposed to be - but it's NOT. [This of course means that it should have a clear comment to say "This is not a duplicate of code (above, below, in file X), it is different in that ..."]
    3. If the duplication is large enough, it adds to the code-space used in the executable, and thus makes the executable larger. This is one of those things that is difficult to balance - larger code MAY run faster because it's "inline", meaning that there is no jump to another function - or it may run slower because the caches are more used. In this particular case I would suggest a common function is better, because the processor will do better branch prediction based on the previous run of the same function.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    In addition, the compiler has heuristics that may make it inline the code of the function anyway.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by CornedBee View Post
    In addition, the compiler has heuristics that may make it inline the code of the function anyway.
    Yes, indeed. I should probably have added that the compiler has those. However, I'm not aware of any compiler that can do "function out of duplicated code", so if you duplicate a sufficiently large portion of code, the compiler will NOT be able to make the "don't inline this" optimization.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Fountain of knowledge.
    Join Date
    May 2006
    Posts
    794
    Quote Originally Posted by agentsmith View Post
    Code:
     do{
    	x = 0;
    	do{
    	a = fgetc (fp);
    	if (a != ' ' && a != '\n' && 
    		a != EOF && a !=',' && 
    		a !=';' && a !='.'&& 
    		a !='!' && a !='?' && 
    		a !=':' && a !='`'){
    	word[x] = tolower(a);
    	  x++;
    	  }else{
    	if (x != 0){
    		word[x] = '\0';
    	    t = insertList (word, t);
    		wordcount += 1;
    		}
    	}
          }while (a != ' ' && a != '\n' && 
    			  a != EOF && a !=',' && 
    			  a !=';' && a !='.'&& 
    			  a !='!' && a !='?' && 
    			  a !=':' && a !='`');
          
        } while (a != EOF);
    here is my function that reads in characters from a .txt file, forms them into words and puts them in a list excluding characters i specified). i was told that this approach is dangerous for some reason and that i should only test characters once. why is that important and how can i rewrite the above loop? (i need the loop to place words in a list while excluding double spaces as well. i tried doing in with one test but double spaces were included into the list and messed up my word count). thanks
    That looks a bit messy to me.
    I would do it something like this, if I have understood the question correctly.
    It's just the general idea and it has not been compiled or tested.
    So don't blame me if it does not work or compile :O)
    I would be surprised if it did!!
    Well not that surprised ;O)


    Code:
    char word[MAX_NUMBER_OF_WORDS][MAX_WORD_LENGTH];
    
    int w=0;
    int  x=0;
    while ( ( a = fgetc (fp)   ) != EOF){
           v= tolower(a);
           if (v>='a' && v<='z'){
                word[w,x++] = a;       
           }
            else{
    	        w++; x=0;
    	}
     }

  10. #10
    CSharpener vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,484
    So don't blame me if it does not work or compile
    But of course we will blame you. Somebody posting the example should AT LEAST learn the syntax of the language
    The first 90% of a project takes 90% of the time,
    the last 10% takes the other 90% of the time.

  11. #11
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    22,268
    so if you duplicate a sufficiently large portion of code, the compiler will NOT be able to make the "don't inline this" optimization.
    I am a little confused here. It seems to me that "if you duplicate a sufficiently large portion of code" by replacing them with calls of a function, the compiler will evaluate the situation as it being bad for inlining, so it will make the "don't inline this" optimisation (or to look at it another way: the compiler will not make the "inline this" optimisation), possibly even against the programmer's suggestion (e.g., with the inline keyword).

    Off topic: I find it ironic that "agentsmith" (c.f. Matrix trilogy) is talking about duplicate code.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  12. #12
    Fountain of knowledge.
    Join Date
    May 2006
    Posts
    794
    Quote Originally Posted by vart View Post
    But of course we will blame you. Somebody posting the example should AT LEAST learn the syntax of the language
    Not really, the compiler is better at that then me, it will inform me of any errors
    when I compile it.

    But then I expect all the code you write compiles first time because you are so clever???

  13. #13
    CSharpener vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,484
    it will inform me of any errors
    ha-ha
    The first 90% of a project takes 90% of the time,
    the last 10% takes the other 90% of the time.

  14. #14
    Fountain of knowledge.
    Join Date
    May 2006
    Posts
    794
    Quote Originally Posted by vart View Post
    But of course we will blame you. Somebody posting the example should AT LEAST learn the syntax of the language
    OK so I put
    word[w,x++] = a;
    Instead of
    word[w][x++] = a;

    Big deal!
    Seems to compile nicely now, and no doubt work ;O)

    Code:
    int word[MAX_NUMBER_OF_WORDS][MAX_WORD_LENGTH];
    
    int w=0;
    int  x=0;
    while ( ( a = fgetc (fp)   ) != EOF){
           v= tolower(a);
           if (v>='a' && v<='z'){
                word[w][x++] = a;       
           }
            else{
    	        w++; x=0;
    	}
     }

  15. #15
    CSharpener vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,484
    compile? work? yeah... [sarcasm]
    The first 90% of a project takes 90% of the time,
    the last 10% takes the other 90% of the time.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Proposal: Code colouring
    By Perspective in forum A Brief History of Cprogramming.com
    Replies: 28
    Last Post: 05-14-2007, 08:23 AM
  2. Updated sound engine code
    By VirtualAce in forum Game Programming
    Replies: 8
    Last Post: 11-18-2004, 12:38 PM
  3. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 06:06 PM
  4. Replies: 4
    Last Post: 01-16-2002, 12:04 AM
  5. Very dangerous code...
    By Magos in forum C++ Programming
    Replies: 27
    Last Post: 11-11-2001, 02:01 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21