Thread: Help with writing a preprocessor to remove comments

  1. #1
    Registered User
    Join Date
    Jan 2011
    Posts
    75

    Help with writing a preprocessor to remove comments

    Hi,

    I have an assignment for homework which is giving me trouble. You can see by the code I have started I am just stuck. Also this code (bad as it is) is 100% mine no copying pasting etc... The assisnment is such that we have to write a program to remove comments and replace them with a space. A comment in our case is considered to start with /* and end with */ so if you had /*abc*/ abc would be replaced with a space. It needs to operate with standard Input Output streams. So the programs name is decomment and the execution would look something like this
    decomment < somefile.c > somefilewithoutcomments.c 2> errorandwarningmessages

    I'm a bit lost but more then willing to put the work in to get it right. Any help is greatly appreciated. My code isn't to great it needs work, I'm not even sure what to return I know it needs to return success and non-success(errors) but not sure how to do that. But as I said I will work to make it better.


    Code:
    #include<stdio.h>
    
    
    
    int main ()
    
         enum state {Out, Slash, In, Star;}
    int c = getchar();
    
    Switch (state)
    
    {case Out:
     if c=='/'
     state = Slash
    break;
    
    case Slash:
    if c =='*'
    state = In
    
    else if(c!='*')
    state = Out
    break;
    
    case In:
    if c=='*'
    state = star
    break;
    
    
          case Star:
    if c == '/'
    state = Out
    else if (c!='*')
    state = In
    
           }

  2. #2
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    So, you want to copy a c file without comments? Well, you have to actually write the file. Apart from obvious syntax errors, you need to plan to encounter both the characters individually where they are not comments. For example you might find division expressions or multiplication expressions. You need to store both the '/' and a '*' to find a comment. When you do, you output a space as required. When you find those in reverse order, you know where to start copying the file again.

  3. #3
    Registered User
    Join Date
    Jan 2011
    Posts
    75
    Well if you are referring to the othe C file, the whose comments will be removed, there are many files I can choose from, there's not one in particular. In terms of the file to do the decommenting, that's what I need help with.

  4. #4
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    I'm referring to what your program should actually do.

  5. #5
    Registered User
    Join Date
    Jan 2011
    Posts
    75
    Well yes, I guess I'm just not sure how. I have the switch statement and anything that's in the In state or Star State is a comment and should be replaced with a space. How to get it to do that and how to output errors like if someone starts a comment but never finishes is what I nees help in

  6. #6
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Put in words what should happen. Once you understand what should happen, you should be able to understand how to make that happen.


    Quzah.
    Hope is the first step on the road to disappointment.

  7. #7
    Registered User
    Join Date
    Jan 2011
    Posts
    75
    Sure I know what to do from a big picture standpoint. Something like if case = Star || case = in
    char = " " but exactly how to do it and apply it, I am not sure, I think I just need help getting started. Any help at all is greatly appreciated.

  8. #8
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by tmd2 View Post
    Sure I know what to do from a big picture standpoint. Something like if case = Star || case = in
    char = " " but exactly how to do it and apply it, I am not sure, I think I just need help getting started. Any help at all is greatly appreciated.
    You might benefit from writing it as rules...

    1) when I find /*
    2) I ignore everything until
    3) I find */

    You might also find it a lot easier to treat them as a string pair rather than as single characters... Something like the strstr() function from the strings library might prove very helpful here.

    As a convenience to easy parsing, you might also want to load the whole file into memory as a single char buffer and work through it like it's one giant string.

    Then, finally there's the little problem of nested comments which happen when a section of code containing comments is commented out...
    Last edited by CommonTater; 01-29-2011 at 07:27 PM.

  9. #9
    Registered User claudiu's Avatar
    Join Date
    Feb 2010
    Location
    London, United Kingdom
    Posts
    2,094
    Nope, read again what quzah said. You still can't describe what you want to happen in natural language so naturally, you can't program it either.
    1. Get rid of gets(). Never ever ever use it again. Replace it with fgets() and use that instead.
    2. Get rid of void main and replace it with int main(void) and return 0 at the end of the function.
    3. Get rid of conio.h and other antiquated DOS crap headers.
    4. Don't cast the return value of malloc, even if you always always always make sure that stdlib.h is included.

  10. #10
    Registered User
    Join Date
    Jan 2011
    Posts
    75
    Hey Tater,

    Sure let me take this one step at a time. How would I save the file to memory and treat it as a string. She wants it to be a character, so I don't think I can use that option.

  11. #11
    Registered User
    Join Date
    Jan 2011
    Posts
    75
    Ok in plain English what I want to do is this. Write a program that takes another C program and anywhere in between /* and */ I want to replace with a space. Therefore it will be decommented. I want to also write to stderr, if for example a comment is started but not finished so /* without a */ later on in the text should return an error

  12. #12
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by tmd2 View Post
    Hey Tater,

    Sure let me take this one step at a time. How would I save the file to memory and treat it as a string. She wants it to be a character, so I don't think I can use that option.
    Sure you can... you need to use directory functions to get the file size... then malloc() to create a character buffer, open the file in binary mode, then read the whole file into the buffer in one pass and close the file. It's done all the time.

    (Hint: There's nothing wrong with char buffer* = malloc(filesize), most machines have gigabytes of memory and most C files are actually quite small; less than 50k.)

    Once you have the thing as one continuous buffer just use strstr(Buffer, "/*"); to find the first occurance ... then search from there and find it's compliment... now move the text at the second pointer forward to the first pointer, incrementing as you go and finally adjust your filesize and repeat... (You will find memcpy() especially useful here).

    For a nested comments implentation you do essentially the same thing except you check between matching pairs ... find the /*, find it's compliment, search from the first one again if you find a second or third opener... you need to find the respective number of closers. Once you've found the outermost pair... move the text forward and do it again.

    Once you've found them all... Open your file in binary mode, write the remaining text out in a single pass, close the file and you're done.

    Believe me it will be far harder to try to do this line by line, especially when comments can span dozens of lines.
    Last edited by CommonTater; 01-29-2011 at 07:59 PM.

  13. #13
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Then, finally there's the little problem of nested comments which happen when a section of code containing comments is commented out..
    Quote Originally Posted by C89 Draft
    3.1.9 Comments

    Except within a character constant, a string literal, or a comment, the characters /* introduce a comment. The contents of a comment are examined only to identify multibyte characters and to find the characters */ that terminate it.21

    21

    Thus comments do not nest.
    AFAICT, this has not changed since then.

  14. #14
    Registered User
    Join Date
    Jan 2011
    Posts
    75
    Thanks Tater,

    What you described seems like it may be more advanced then where we are at. I have no idea how to do that. Keep in mind our programs are very small right now. I don't know I'm confused about the whole thing.

  15. #15
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Ok... I give up...

    These last few days it seems like every darned thing I say just gets chewed up and spit back at me. Enjoy your forums, people... I've had all of this I'm going to take.

    Bye!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Very slow file writing of 'fwrite' function in C
    By scho in forum C Programming
    Replies: 6
    Last Post: 08-03-2006, 02:16 PM
  2. Folding@Home Cboard team?
    By jverkoey in forum A Brief History of Cprogramming.com
    Replies: 398
    Last Post: 10-11-2005, 08:44 AM
  3. The Art of Writing Comments :: Software Engineering
    By kuphryn in forum C++ Programming
    Replies: 15
    Last Post: 11-23-2002, 05:18 PM
  4. ignoring comments
    By wjday in forum C Programming
    Replies: 11
    Last Post: 04-24-2002, 11:16 AM
  5. remove comments from source code
    By limbo100 in forum C Programming
    Replies: 2
    Last Post: 09-29-2001, 06:25 PM