Thread: why is duplicate code dangerous?

  1. #1
    Registered User
    Join Date
    Dec 2007
    Posts
    67

    why is duplicate code dangerous?

    Code:
     do{
    	x = 0;
    	do{
    	a = fgetc (fp);
    	if (a != ' ' && a != '\n' && 
    		a != EOF && a !=',' && 
    		a !=';' && a !='.'&& 
    		a !='!' && a !='?' && 
    		a !=':' && a !='`'){
    	word[x] = tolower(a);
    	  x++;
    	  }else{
    	if (x != 0){
    		word[x] = '\0';
    	    t = insertList (word, t);
    		wordcount += 1;
    		}
    	}
          }while (a != ' ' && a != '\n' && 
    			  a != EOF && a !=',' && 
    			  a !=';' && a !='.'&& 
    			  a !='!' && a !='?' && 
    			  a !=':' && a !='`');
          
        } while (a != EOF);
    here is my function that reads in characters from a .txt file, forms them into words and puts them in a list excluding characters i specified). i was told that this approach is dangerous for some reason and that i should only test characters once. why is that important and how can i rewrite the above loop? (i need the loop to place words in a list while excluding double spaces as well. i tried doing in with one test but double spaces were included into the list and messed up my word count). thanks

  2. #2
    Registered User
    Join Date
    Jan 2008
    Posts
    58
    Duplicate code is dangerous because if you need to change it, you might forget to change all of the duplicates too. I'd fix your loop with a function.
    Code:
    int is_punctuation(int a)
    {
        return a != ' ' && a != '\n' && 
               a != EOF && a !=',' && 
               a !=';' && a !='.'&& 
               a !='!' && a !='?' && 
               a !=':' && a !='`';
    }
    instead of doing the same test over and over, call the function.
    Code:
    do
    {
        x = 0;
        do
        {
            a = fgetc (fp);
            if (is_punctuation(a))
            {
                word[x] = tolower(a);
                x++;
            }
            else
            {
                if (x != 0)
                {
                    word[x] = '\0';
                    t = insertList (word, t);
                    wordcount += 1;
                }
            }
        }
        while (is_punctuation(a));
    }
    while (a != EOF);
    I kind of guessed about the name for the function.

  3. #3
    Registered User
    Join Date
    Dec 2007
    Posts
    67
    thanks i will try that.

  4. #4
    Registered User
    Join Date
    Jan 2008
    Posts
    1
    under <ctype.h> isnt there a ispunc command?

  5. #5
    The superhaterodyne twomers's Avatar
    Join Date
    Dec 2005
    Location
    Ireland
    Posts
    2,273
    edit: Never mind. I thought you were testing punctuation to ensure that tolower() didn't mess a , down.

    Also. I don't think there's a need to check for punctuation:
    Code:
    int main( void ) {
      char array[] = "ABCDefG,.12345";
      int i;
    
      for ( i=0; i<sizeof array; i++ )
        printf( "&#37;c :: %c\n", array[i], tolower(array[i]) );
    }
    My output:
    Code:
    A :: a
    B :: b
    C :: c
    D :: d
    e :: e
    f :: f
    G :: g
    , :: ,
    . :: .
    1 :: 1
    2 :: 2
    3 :: 3
    4 :: 4
    5 :: 5
    Edit: Also is_punctuation should be called not_punctuation.
    Last edited by twomers; 01-06-2008 at 07:22 PM.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Duplicated code is "bad" in a few aspects:
    1. You may end up changing only one of the two or more places that need changing.
    2. It is sometimes hard to determine if the similar looking code is actually identical, or just nearly the same [and if it's actually supposed to be the same or not]. This is of course related to 1 - you may be looking for a bug, and finding that the code is "nearly identical", you may assume that it's supposed to be - but it's NOT. [This of course means that it should have a clear comment to say "This is not a duplicate of code (above, below, in file X), it is different in that ..."]
    3. If the duplication is large enough, it adds to the code-space used in the executable, and thus makes the executable larger. This is one of those things that is difficult to balance - larger code MAY run faster because it's "inline", meaning that there is no jump to another function - or it may run slower because the caches are more used. In this particular case I would suggest a common function is better, because the processor will do better branch prediction based on the previous run of the same function.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    In addition, the compiler has heuristics that may make it inline the code of the function anyway.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by CornedBee View Post
    In addition, the compiler has heuristics that may make it inline the code of the function anyway.
    Yes, indeed. I should probably have added that the compiler has those. However, I'm not aware of any compiler that can do "function out of duplicated code", so if you duplicate a sufficiently large portion of code, the compiler will NOT be able to make the "don't inline this" optimization.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Fountain of knowledge.
    Join Date
    May 2006
    Posts
    794
    Quote Originally Posted by agentsmith View Post
    Code:
     do{
    	x = 0;
    	do{
    	a = fgetc (fp);
    	if (a != ' ' && a != '\n' && 
    		a != EOF && a !=',' && 
    		a !=';' && a !='.'&& 
    		a !='!' && a !='?' && 
    		a !=':' && a !='`'){
    	word[x] = tolower(a);
    	  x++;
    	  }else{
    	if (x != 0){
    		word[x] = '\0';
    	    t = insertList (word, t);
    		wordcount += 1;
    		}
    	}
          }while (a != ' ' && a != '\n' && 
    			  a != EOF && a !=',' && 
    			  a !=';' && a !='.'&& 
    			  a !='!' && a !='?' && 
    			  a !=':' && a !='`');
          
        } while (a != EOF);
    here is my function that reads in characters from a .txt file, forms them into words and puts them in a list excluding characters i specified). i was told that this approach is dangerous for some reason and that i should only test characters once. why is that important and how can i rewrite the above loop? (i need the loop to place words in a list while excluding double spaces as well. i tried doing in with one test but double spaces were included into the list and messed up my word count). thanks
    That looks a bit messy to me.
    I would do it something like this, if I have understood the question correctly.
    It's just the general idea and it has not been compiled or tested.
    So don't blame me if it does not work or compile :O)
    I would be surprised if it did!!
    Well not that surprised ;O)


    Code:
    char word[MAX_NUMBER_OF_WORDS][MAX_WORD_LENGTH];
    
    int w=0;
    int  x=0;
    while ( ( a = fgetc (fp)   ) != EOF){
           v= tolower(a);
           if (v>='a' && v<='z'){
                word[w,x++] = a;       
           }
            else{
    	        w++; x=0;
    	}
     }

  10. #10
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    So don't blame me if it does not work or compile
    But of course we will blame you. Somebody posting the example should AT LEAST learn the syntax of the language
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  11. #11
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    so if you duplicate a sufficiently large portion of code, the compiler will NOT be able to make the "don't inline this" optimization.
    I am a little confused here. It seems to me that "if you duplicate a sufficiently large portion of code" by replacing them with calls of a function, the compiler will evaluate the situation as it being bad for inlining, so it will make the "don't inline this" optimisation (or to look at it another way: the compiler will not make the "inline this" optimisation), possibly even against the programmer's suggestion (e.g., with the inline keyword).

    Off topic: I find it ironic that "agentsmith" (c.f. Matrix trilogy) is talking about duplicate code.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  12. #12
    Fountain of knowledge.
    Join Date
    May 2006
    Posts
    794
    Quote Originally Posted by vart View Post
    But of course we will blame you. Somebody posting the example should AT LEAST learn the syntax of the language
    Not really, the compiler is better at that then me, it will inform me of any errors
    when I compile it.

    But then I expect all the code you write compiles first time because you are so clever???

  13. #13
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    it will inform me of any errors
    ha-ha
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  14. #14
    Fountain of knowledge.
    Join Date
    May 2006
    Posts
    794
    Quote Originally Posted by vart View Post
    But of course we will blame you. Somebody posting the example should AT LEAST learn the syntax of the language
    OK so I put
    word[w,x++] = a;
    Instead of
    word[w][x++] = a;

    Big deal!
    Seems to compile nicely now, and no doubt work ;O)

    Code:
    int word[MAX_NUMBER_OF_WORDS][MAX_WORD_LENGTH];
    
    int w=0;
    int  x=0;
    while ( ( a = fgetc (fp)   ) != EOF){
           v= tolower(a);
           if (v>='a' && v<='z'){
                word[w][x++] = a;       
           }
            else{
    	        w++; x=0;
    	}
     }

  15. #15
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    compile? work? yeah... [sarcasm]
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Proposal: Code colouring
    By Perspective in forum A Brief History of Cprogramming.com
    Replies: 28
    Last Post: 05-14-2007, 07:23 AM
  2. Updated sound engine code
    By VirtualAce in forum Game Programming
    Replies: 8
    Last Post: 11-18-2004, 12:38 PM
  3. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 05:06 PM
  4. Replies: 4
    Last Post: 01-16-2002, 12:04 AM
  5. Very dangerous code...
    By Magos in forum C++ Programming
    Replies: 27
    Last Post: 11-11-2001, 02:01 AM