[C] remove comments

This is a discussion on [C] remove comments within the C Programming forums, part of the General Programming Boards category; Originally Posted by laserlight Besides automatic concatenation, a string literal can span multiple lines in the source code by using ...

  1. #16
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by laserlight View Post
    Besides automatic concatenation, a string literal can span multiple lines in the source code by using a backslash at the end of the lines that it spans (other than the last). However, if I remember correctly this would become a single line after preprocessing.
    You don't have to have a backslash.
    Code:
    char foo[] = "this is"
        " fine";

    Quzah.
    Hope is the first step on the road to disappointment.

  2. #17
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,593
    Quote Originally Posted by quzah
    You don't have to have a backslash.
    That is automatic concatenation (of adjacent string literals). I am talking about a string literal spanning multiple lines in the physical source code, where the multiple lines become a single line.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #18
    Registered User
    Join Date
    Jul 2009
    Location
    Croatia
    Posts
    272
    What you need to do is come up with a list of test cases that you want to support. The list should include all the complex possibility with single quotes, double quotes, multi line comments and any other possibilities you can think of. Do you need to worry about ??/ used in place of \? What about C++/C99 single line comments? If yes, write a few examples.
    Yes, thats where i need your help. I know i need 4 states: what i dont know are the conditions of switching between them.

    I'm going to work with ANSI C Standard. So no stuff like //, it seems illogical anyway - i'm checking a C program, not a C++ program or anything else.

    So i want to support everything i can:

    1. If i'm in state double_quote, and a backlash appears, any other characters followed by the backlash (except \n) will be ignored and state remains double quote after that.

    So i need states: double_quote, dquote_escape

    2. If i'm in state double_quote

    Code:
    printf("test
              test");
    This is an error aswell (not really an error, just a warning for changing states). if(state==dquote && c=='\n') state = normal;


    So i think i covered every possiblity for dquote state this way. Is there any other? I cant think of anything else.

    As for single_quote state, i still have no idea what to do about it - really need some advice here.
    Last edited by Tool; 11-15-2009 at 06:14 AM.

  4. #19
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by Tool View Post
    I'm going to work with ANSI C Standard.
    Which year?
    Quote Originally Posted by Tool View Post
    So no stuff like //, it seems illogical anyway - i'm checking a C program, not a C++ program or anything else.
    The C99 Standard allows for // comments.


    Quzah.
    Hope is the first step on the road to disappointment.

  5. #20
    Registered User
    Join Date
    Jul 2009
    Location
    Croatia
    Posts
    272
    The standard which doesnt allows //.

    May i get some serious answers now?
    Last edited by Tool; 11-15-2009 at 06:24 AM.

  6. #21
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Did you actually have a question? You've been given a bunch of hints and suggestions without actually posting any code, so I'm not sure what you're expecting here. Especially since you aren't actually asking questions.


    Quzah.
    Hope is the first step on the road to disappointment.

  7. #22
    Registered User
    Join Date
    Jul 2009
    Location
    Croatia
    Posts
    272
    Im looking for a confirmation about double_quote state as i wrote 2 posts ago.

    My question is: Did i miss a possible change from dquote state? Aside from the 2 possibilities i wrote 2 posts ago?

    I cant start writing code if i dont know what the exercise asks for.

    My 2nd question is, on which i didnt get any answer is: How am i supposed to treat single_quotes in this exercise. I dont understand this part at all. You cant open a comment inside a ' ' so, my question is what did the authors refer to when they said, "Dont forget to handle chracter constants properly".


    This is the code im planing to use. I'm just going to take it from the exercise 1-24 and modify it a bit. Where i get boggled is, i dont know what to do about single quotes in this exercise.

    Code:
          int c;
          enum states { normal,
                        comment_entry, comment, comment_exit,
                        squote, squote_escape, squote2, squote3,
                        dquote, dquote_escape };
    
    ...
    
    
    
          while((c=getchar())!=EOF)
          {
              if(state==normal && c=='/') { state = comment_entry; }
              else if(state==normal && c=='"') { state = dquote; }
              else if(state==normal && c=='\'') { state = squote; }
              
              else if(state==normal) { if((check(c))==0) {printf("Line %d: Mismatching %c\n", line, c); valid=0;} } 
              
              else if(state==comment_entry && c=='*') { state = comment; }
              else if(state==comment_entry && c=='/') { }
              else if(state==comment_entry) { state = normal;  if((check(c))==0) {printf("Line %d: Mismatching %c\n", line, c); valid=0; }  }  /* neccesary to check braces here aswell */
              
              else if(state==comment && c=='*') { state = comment_exit; }
              else if(state==comment) {}
              
              else if(state==comment_exit && c=='*') {}
              else if(state==comment_exit && c=='/') { state = normal; }
              else if(state==comment_exit) { state = comment; }
              
              else if(state==dquote && c=='"') { state = normal; }
              else if(state==dquote && c=='\n') { state = normal; printf("Line %d: Double quotes not properly closed.\n", line); valid=0; }
              else if(state==dquote && c=='\\') { state = dquote_escape; }
              else if(state==dquote) { }
              
              else if(state==dquote_escape) { state = dquote; }
    }
    Last edited by Tool; 11-15-2009 at 06:56 AM.

  8. #23
    Registered User
    Join Date
    Jul 2009
    Location
    Croatia
    Posts
    272
    Ok, well i've came up with a code, but it only works with double_quote, comment and normal state. I didnt include character constants cause i dont know how to treat them.

    It reads the program into array, and processes the data with a while loop.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    #define MAX_SIZE 1024
    
    int main(int argc, char *argv[])
    {
        
         int c;
         int x=0;                  
         char array[MAX_SIZE];
         enum states { normal,
                       comment,
                       dquote, dquote_escape };
         int state = normal;
             
         /*
         FILE *f;
         f = fopen("zad88.c", "r");
         if (f == NULL) return 1;
            
         while((c=fgetc(f))!=EOF) 
         */
         
         while((c=getchar())!=EOF)
         if(x<MAX_SIZE)
         array[x++]=c;
         else
         return 1;
         
         array[x]='\n';
         array[++x]='\0';
         
         int pom=x;
         x=0;
         
         while(x!=pom)
         {
            if(array[x]=='/' && array[x+1] == '*' && state == normal) { array[x]=' '; state = comment; x++;}
            else if(array[x]=='*' && array[x+1] == '/' && state == comment) { array[x]=array[x+1]=' '; state = normal; x++; }
         
            else if(array[x]=='"' && state == normal) { state = dquote; }
            else if(array[x]=='\\' && state == dquote) { state = dquote_escape; }
            else if(state==dquote_escape) { state = dquote; }
            else if(state == dquote && array[x]=='\n') { state = normal; }
            else if(array[x]=='"' && state == dquote) { state = normal; }
            
            if(state == comment)
            array[x]=' ';
            
            x++;
         } 
            
         
         printf("%s\n", array);   
         
         
         
     
      printf("Press any key to continue.\n");	
      getchar();
      return 0;
    }

  9. #24
    Registered User
    Join Date
    Apr 2006
    Posts
    2,023
    You should seperate the next state logic for each state, for instance:
    Code:
    switch(state){
      case normal:
      {
        if(array[x]=='/' && array[x+1] == '*')
          state=comment;
        break;
      }
      case comment:
      {
        if(array[x]=='*' && array[x+1] == '/')
          state=normal;
        break;
      }
    }
    It make it easier to follow.

    Single quote character literals will also be a state I think. They should work just like double quote, except that they begin and end with single quotes.

    And are you going to worry about the ??/ trigraph? Eg:
    Code:
    /*the bellow does not start a comment.*/
    "??/"/*";
    Last edited by King Mir; 11-15-2009 at 05:12 PM.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  10. #25
    Registered User
    Join Date
    Nov 2009
    Posts
    7
    Quote Originally Posted by King Mir View Post

    And are you going to worry about the ??/ trigraph? Eg:
    Code:
    /*the bellow does not start a comment.*/
    "??/"/*";
    If this is a prelude to studying compilers, you might want to:

    1 replace "\r\n" sequences with "\n", and replace"\r" by it itself with "\n"
    2 merge continued lines. Be aware that the backslash and newline may be separated by an arbitrary amount of white space.
    3 Replace trigraphs, if you like.
    4 Only then, strip comments. Or rather, replace them by a single space. Newline for BCPL (or CPP) comments makes for neater preprocessor output.

    You are aware of string literals. Treat character literals exactly as string literals. For example, do not strip comments from these lines (there are no comments):

    Code:
    x = "/* blah */";
    y = '/* blah */';
    The latter may be a syntactic error, but it is a valid token, and you should not care about it (yet).

    When you are done worrying about terminators, continued lines, trigraphs and string literals, you may also want to correctly handle this:

    Code:
                 #             include <uh/*this_is_not_a_comment*/oh.h>
    Which is a valid path.

    Yeah, bummer, you need to parse preprocessor tokens as well.

    Nobody said it was going to be easy...

    [edit]

    Dammit, I just found this:

    If the characters ', \, ", //, or /* occur in the sequence between the < and > delimiters, the behavior is undefined. Similarly, if the characters ', \, //, or /* occur in the sequence between the " delimiters, the behavior is undefined.
    And in a footnote:

    Thus, sequences of characters that resemble escape sequences cause undefined behaviour.
    This implies that my example is *NOT* valid as far as C source goes, and you are free to treat it however you like. I am going to have to edit this entire post.

    The quote is from 6.4.7.3 of

    http://www.open-std.org/jtc1/sc22/wg...docs/n1336.pdf

    The translation phases are described in 5.1.1.2.

    [edit]:

    It is summarised (in ascii art for the win) here:

    Poster with the 8 phases of translation in the C language - Stack Overflow

    [edit]:

    I noted that it says "Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines."

    You will find that some compilers allow for whitespace between the two (a mistake easily made). I am thinking GCC.

    When you are done replacing comments, you can continue with the other bullets. Let us see what you come up with. Especially when you have finished the compiler ;-).

    No, really, I should like to comment as you go along, and be your designated beta tester ;-)

    [another edit]:

    Okay I see that in your previous source, you just overwrite comments with spaces and then dump the entire array. While that is of course a (pragmatic) way to do it, you could operate on a smaller "window" of the source file and generate output as you go (skipping comments). Instead of allocating memory (which you do statically, on the stack, imposing limits), you can use file offsets for recalling previous locations in the source file. I know, your focus is elsewhere, I am just suggesting... If a thing is worth doing, it is worth doing properly (someone said).

    --
    Aksel
    Last edited by Aksel; 11-17-2009 at 01:29 AM.

  11. #26
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,593
    Quote Originally Posted by Tool
    Does merging lines means removing any blanks/tabs between the lines? And removing the '\n' itself aswell?
    The idea is that usually, lines in the physical source code correspond to lines in the logical source code, but it is possible to continue a logical line to the next physical line by "escaping" the new line at the end of the physical line. Askel's observation is that some compilers do not conform to the standard strictly by allowing the "escaping" to have whitespace in between. Either way, you would not be removing the newline at the end of the logical line, but you will remove the backslash and newline that forms the continuation.

    Quote Originally Posted by Tool
    I'm not familiar with trigraphs. What is it exactly? What should i replace them with?
    They are certain sequences of three characters that are to be replaced by a single character. In King Mir's example, the sequence ??/ would be replaced by a single \ character.

    Quote Originally Posted by Tool
    Also, what exactly is a carriage return and does it have a same effect as a \n?
    You can interpret it as having the same effect as \n. To find out more, search on line endings.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  12. #27
    Registered User
    Join Date
    Jul 2009
    Location
    Croatia
    Posts
    272
    About trigraphs. How can i use them in practice?

    In my compiler, if i write for example ??< as opening the { brace, i get an error while trying to compile it:

    5:1 C:\Users\SUZI\Desktop\C Smeće\zad97.c [Warning] trigraph ??< ignored, use -trigraphs to enable
    Last edited by Tool; 11-18-2009 at 11:27 AM.

  13. #28
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,593
    As the warning suggested, pass -trigraphs as an option when invoking your compiler.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  14. #29
    Registered User
    Join Date
    Jul 2009
    Location
    Croatia
    Posts
    272
    Just to be clear:

    i assume that trigraphs only work when inside normal state (not inside comment, a character constant or a quoted string), correct?

  15. #30
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,593
    Quote Originally Posted by Tool
    i assume that trigraphs only work when inside normal state (not inside comment, a character constant or a quoted string), correct?
    They work everywhere. The replacement is done in a phase before tokenisation, so at that point comments, character constants and string literals in the source have not been identified.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Page 2 of 4 FirstFirst 1234 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Remove comments
    By St0rM-MaN in forum C Programming
    Replies: 4
    Last Post: 05-18-2007, 11:03 PM
  2. program to remove comments from source
    By Abda92 in forum C Programming
    Replies: 12
    Last Post: 12-25-2006, 04:18 PM
  3. Request for comments
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-02-2004, 09:33 AM
  4. The Art of Writing Comments :: Software Engineering
    By kuphryn in forum C++ Programming
    Replies: 15
    Last Post: 11-23-2002, 04:18 PM
  5. remove comments from source code
    By limbo100 in forum C Programming
    Replies: 2
    Last Post: 09-29-2001, 06:25 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21