Thread: How to detect and remove the comments in a C source file

  1. #1
    Registered User zolfaghar's Avatar
    Join Date
    Mar 2016
    Posts
    95

    How to detect and remove the comments in a C source file

    I have been looking at a problem of detecting and removing the comment lines. But the program just stops. Here is the code:


    Code:
    #include <stdio.h>
    // This code removes the comments from a file while moving the non-comments to another file.
    FILE *fs, *ft;
    int main ()
    {
    char source[67], target[67];
    int c,d;
    puts ("Enter C file name");
    gets (source);
    fs = fopen(source, "r");
    if ( fs==NULL)
      {
       puts ("Can not open source file");
      }
    puts ("Enter target name");
    gets (target);
    ft = fopen (target, "w");
    if (ft == NULL)
    {
      puts ("Can not open the target file");
    }
    while ( (c = getc(fs)) != EOF)
      {
        if ( c == '/')
          {
            if ( (d = getc(fs)) == '*')
            incomment();
            else
              {
                 fprintf ( ft, "%c", c);
                 fprintf ( ft, "%c", d);
              }
          }
        else
        fprintf(ft, "%c", c);
      }
      fclose (fs);
      fclose (ft);

    I tried to use GDB, and I want to ask that besides moving the breakpoint further down the code, and press Continue, Print, and Step, is there any other feature of GDB that I can use to quickly find out where the program stops?

    I found Remove comments from C/C++ code - Stack Overflow and Jonathan Leffler's post implies that there are different ways of commenting in C. I think it is really clumsy to have to keep adding if blocks for every possible way to comment in C source files. Is there a more elegant way of doing this? Note, this is just an exercise.

    When I run GDB I get the following, and I do not know why I get such a large number of c. I also can not figure out why I can not backtrack.
    Code:
    Enter C file name
    test.c
    Enter target name
    output
    Breakpoint 1, incomment () at gd.c:44
    44      c = getc(fs);
    (gdb) p c
    $1 = 1982321136
    (gdb) bt
    #0  incomment () at gd.c:44
    #1  0x004014dc in main () at gd.c:27
    (gdb) bt
    #0  incomment () at gd.c:44
    #1  0x004014dc in main () at gd.c:27
    (gdb) bt
    #0  incomment () at gd.c:44
    #1  0x004014dc in main () at gd.c:27
    (gdb)
    more detailed debug output

    Code:
    Enter C file name
    test.c
    Enter target name
    output
    Breakpoint 1, main () at gd.c:27
    27              incomment();
    (gdb) p c
    $1 = 47
    (gdb) p d
    $2 = 42
    (gdb) s
    incomment () at gd.c:44
    44      c = getc(fs);
    (gdb) p c
    $3 = 1982321136
    (gdb)
    Last edited by zolfaghar; 06-04-2016 at 03:54 PM.

  2. #2
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    A backtrace is for examining the call stack. Since you only called increment() from main() there isn't much to report. I think you are misunderstanding the use of backtrace as well. A backtrace will usually pinpoint the line a program crashed on, i.e. you got a Segmentation fault or other signal.
    Code:
    Breakpoint 1, incomment () at gd.c:44
    44      c = getc(fs);
    (gdb) p c
    $1 = 198232113
    getc() hasn't been called yet. GDB stopped here. The value you see is garbage. If you stepped past this line, c would have an intelligible value.

    Is there a more elegant way of doing this? Note, this is just an exercise.
    The easiest way to parse this is to look for / followed by *, and when you find these elements, set a flag. If the flag is set, it means you are inside a comment. So you would write the read content to another file that is just comments. When you find * followed by /, and the flag is set, it means that you are exiting a comment. Unset the flag, and write the content to a different file.

    This logical loop repeats until all of the input has been sifted into comment and non-comment files.

    This is a simple example of a finite state machine, and only handles comments of the form /*comment*/ (rather poorly).

  3. #3
    Registered User
    Join Date
    May 2010
    Posts
    4,633
    You should never use gets(), this function can never be safely used, I suggest you consider using fgets() instead.

    Also why the global variables?


    Jim

  4. #4
    Registered User zolfaghar's Avatar
    Join Date
    Mar 2016
    Posts
    95
    Quote Originally Posted by jimblumberg View Post
    You should never use gets(), this function can never be safely used, I suggest you consider using fgets() instead.

    Also why the global variables?


    Jim
    I'll revise the code to fgets() and remove the global variables also.
    Thanks

  5. #5
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    Here's some code to test your comment-remover on.
    Code:
    #include <stdio.h>
    /* 
     * multi-line
     * comment
     */
    int main(void) {
        // single line comment (end token is newline character)
        printf(" /* <-- doesn't start a multi-line comment.\n");
        printf(" */ <-- if in a multi-line comment, WOULD end it (but wouldn't compile).\n");
        printf(" // <-- doesnt' start a single-line comment.\n");
        printf(" \" <-- doesn't end a string constant.\n");
        /* another comment */
        return 0;
    }

  6. #6
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    945
    Here's a slightly more "challenging" test:

    Code:
    //*/ this is a single-line comment
    //\
    this is a single-line comment split across two lines of source with backslash-newline
    (Incidentally, this forum's syntax highlighter doesn't recognize the second line of the second comment as a comment, so clearly not everyone gets it right.)
    Last edited by christop; 06-06-2016 at 02:13 PM. Reason: ETA comment about this forum's syntax highlighter

  7. #7
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    I think it is a fairly unknown feature... I didn't think that backslash-newline sequence applied to comments.

  8. #8
    Registered User
    Join Date
    Nov 2012
    Posts
    1,393
    The following post gives light on the possibility of backslash-newline appearing in comments. Line splicing is applied before comments, so it appears to be valid.

    C++ single line comments followed by \ transforms in multiline comment - Stack Overflow

  9. #9
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    Another way to sneak in comments is with an #if ... that evaluates to zero, or just #if 0 .

  10. #10
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by c99tutorial
    The following post gives light on the possibility of backslash-newline appearing in comments. Line splicing is applied before comments, so it appears to be valid.
    Yeah, in fact the translation phase in which comments are replaced by spaces happens before preprocessing directives are executed, and I have noticed that the syntax highlighter here likewise fails to highlight multi-line macros correctly.

    Quote Originally Posted by rcgldr
    Another way to sneak in comments is with an #if ... that evaluates to zero, or just #if 0 .
    Those aren't comments according to the grammar, even though they can be used to "comment out" blocks of code.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Test at http://www.artlogic.com/careers/test.html
    By zMan in forum C++ Programming
    Replies: 6
    Last Post: 07-15-2003, 06:11 AM
  2. IQ Test
    By DarkViper in forum A Brief History of Cprogramming.com
    Replies: 40
    Last Post: 11-24-2002, 01:42 AM

Tags for this Thread