Thread: program to remove comments from source

  1. #1
    Registered User
    Join Date
    Sep 2006
    Posts
    230

    program to remove comments from source

    Hi Iv been working on a program that opens a source file then copy's the src to another file excluding all comments. I have the idea and the concept, and have written this src:
    Code:
    #include <stdio.h>
    #define IN 1       /* getc() is now reading comment(not outputting text) */
    #define OUT 0      /* getc() isn't reading comment(outputting text) */
    
    int main() {
       int mode, l, j, k;
       FILE *src, *newSrc;
       
       src = fopen("C:\\1-23.c", "rt");
       newSrc = fopen("C:\\src.c", "wt");
       mode = OUT;
       for (;;) {
          /* Entering comment */
          if ((l = getc(src)) == EOF) {
             printf("l End of file reached");
             break;
          }
          if ((j = getc(src)) == EOF) {
             printf("j End of file reached");
             break;
          }
          if (mode == OUT && l == '/' && j == '*') {
             mode = IN;
          }
          else if (mode == OUT && j == '/') {
             k = getc(src);
             if (k == '*') {
                mode = IN;
             }
             else {
                putc(j, newSrc);
                putc(l, newSrc);
                putc(k, newSrc);
                break;
             }
          }
          else if (mode == OUT) {
             putc(l, newSrc);
             putc(k, newSrc);
          }   
          /* Leaving comment */
          if (mode == IN && l == '*' && j == '/') {
             mode = OUT;
          }
          if (mode == IN && j == '/') {
             k = getc(src);
             if (k == '/') {
                mode = OUT;
             }
          }
       }
       fclose(src);
       fclose(newSrc);
       getchar();
       return 0;
    }
    I'm sorry if there's anything hard to understand in the source because i didn't write any comments. When i run this, it creates a blank file with weird characters in it. Can anyone please find were this bug is?

  2. #2
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    Code:
     else {
                putc(j, newSrc);
                putc(l, newSrc);
                putc(k, newSrc);
                break;
             }
    Wrong order of chars
    No need to break...

    There are other problems also...
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  3. #3
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    Why don't you work by getting one character at a time instead of 2 ? It would make a lot of thins simpler, if you ask me.
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  4. #4
    C 1337 Meshal's Avatar
    Join Date
    Nov 2006
    Posts
    70
    another thing use fgetc instead of getc
    http://www.cplusplus.com/ref/cstdio/fgetc.html

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > #define IN 1
    If you define more states, like COMMENT_LEADIN and COMMENT_LEADOUT then you only need to read one character at a time, and look at the current state to decide what the new state should be.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    230
    thanks everyone i got it working with the following src:
    Code:
    #include <stdio.h>
    #define IN 1       /* getc() is now reading comment(not outputting text) */
    #define OUT 0      /* getc() isn't reading comment(outputting text) */
    
    int main() {
       int mode, i, l, j, k;
       FILE *src, *newSrc;
       
       src = fopen("C:\\1-23.c", "rt");
       newSrc = fopen("C:\\src.c", "wt");
       mode = OUT;
       for (i = 1;;i++) {
          /* Entering comment */
          if ((l = getc(src)) == EOF) {
             printf("End of file reached");
             break;
          }
          if (mode == OUT && l == '/') {
             if ((j = getc(src)) == EOF) {
                printf("End of file reached");
             }
             else if (j == '*') {
                mode = IN;
             }
          }
          /* code to be written into new file if not entering comment */
          else if (mode == OUT) {
             putc(l, newSrc);
          }   
          /* Leaving comment */
          if (mode == IN && l == '*') {
             if ((j = getc(src)) == '/') {
                mode = OUT;
             }
          }
       }
       fclose(src);
       fclose(newSrc);
       getchar();
       return 0;
    }
    just a few questions for the future:
    Quote Originally Posted by meshal
    another thing use fgetc instead of getc
    http://www.cplusplus.com/ref/cstdio/fgetc.html
    Can you please explain why? i checked the link you provided but i don't see a difference between the two functions.
    Quote Originally Posted by salem
    > #define IN 1
    If you define more states, like COMMENT_LEADIN and COMMENT_LEADOUT then you only need to read one character at a time, and look at the current state to decide what the new state should be.
    can you please explain? i don't really understand.
    Thanks again everyone i really appreciate your help.

  7. #7
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    You can make 4 states
    Out -> Comment_LEADIN -> IN -> Commen_LEADOUT -> OUT

    The decision can be made based on one char only
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  8. #8
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    thanks everyone i got it working with the following src:
    Are you sure about that? This input:
    Code:
    #include <stdio.h>
    
    int main()
    {
    	int x = 6 / 2;
    	// This is a test
    	/* This is a test */
    	printf("/* Hey! Not really a test!*/\n");
    
    	return 0;
    }
    Resulted in this output:
    Code:
    #include <stdio.h>
    
    int main()
    {
    	int x = 6 2;
    	 This is a test
    	
    	printf("\n");
    
    	return 0;
    }
    Didn't accout for string literals (albeit that was a cruel test) but it trips over division.

    Also, AFAIK, "wt" and "rt" are not valid modes for fopen(). (Perhaps "w+" and "r+"?) See the man page on fopen().

    Also... why? Comments are a good thing, most of the time. (The fourth part of my signature is meant tongue in cheek, I assure you. )
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  9. #9
    Registered User
    Join Date
    Sep 2006
    Posts
    230
    Quote Originally Posted by vart
    You can make 4 states
    Out -> Comment_LEADIN -> IN -> Commen_LEADOUT -> OUT

    The decision can be made based on one char only
    I'm sorry I still don't understand.
    Quote Originally Posted by Cactus_Hugger
    Didn't accout for string literals (albeit that was a cruel test) but it trips over division.

    Also, AFAIK, "wt" and "rt" are not valid modes for fopen(). (Perhaps "w+" and "r+"?) See the man page on fopen().
    I'm sorry i wasn't clear enough, it only deletes C comments (/* */) not C++ comments ( // ). Although C compilers now support them. And "rt" and "wt" are valid modes for fopen(), the 't' is for "text". I know it's not needed but I write it to make it clear that it's a text file.

  10. #10
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    I mean you can implement somethink like the following statechange algorithm base only on one character just read:

    state = OUT:
    if char == '/' -> state = LEADIN
    else state = OUT

    state = LEADIN:
    if char == '*' -> state = IN
    else if char == '/' state = LEADIN
    else state = OUT;

    state = IN:
    if char == '*' -> state = LEADOUT
    else state = IN

    state = LEADOUT:
    if char == '/' state = OUT
    else if char == '*' -> state = LEADOUT
    else state = IN
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Like
    Code:
    enum state {
      s_code,
      s_commentleadin,
      s_comment,
      s_commentleadout,
      s_string,
      s_escaped
    };
    
    state s = s_code;
    while ( (c=fgetc(fp)) != EOF ) {
      switch ( s ) {
        case s_code:
          if ( c == '/' ) state = s_commentleadin;
          else fput( out, c );
          break;
        case s_commentleadin:
          if ( c == '*' ) state = s_comment;
          else { fputc(out,'/'; fputc(out,c); state = s_code; }
          break;
      }
    }
    Read about finite state machines
    http://en.wikipedia.org/wiki/Finite_state_machine
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User
    Join Date
    Sep 2006
    Posts
    230
    Thanks alot. now I understand. I might change it or just keep it as is. Are there any advantages of using this method?

  13. #13
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    I don`t see any immediate ones...
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. How to make a program that prints it's own source code???
    By chottachatri in forum C++ Programming
    Replies: 38
    Last Post: 03-28-2008, 07:06 PM
  2. Totally confused on assigment using linked lists
    By Uchihanokonoha in forum C++ Programming
    Replies: 8
    Last Post: 01-05-2008, 04:49 PM
  3. comments my program
    By ssharish in forum C++ Programming
    Replies: 4
    Last Post: 02-26-2005, 09:53 AM
  4. Multiple source files for one program
    By gflores in forum C++ Programming
    Replies: 3
    Last Post: 08-15-2004, 02:32 AM