Thread: Command to strip out comments in source files

  1. #1
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229

    Command to strip out comments in source files

    I am trying to find a command to strip out comments in C++ source files. I suspect it would be sed, but I don't know how to use it yet and don't have time to learn it right now. Any sed wizard care to spare a few seconds?

    I need to strip out both "//" and "/* ... */" style comments. If possible, "#if 0/#endif" ones, too.

    Thanks

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Do you use a lot of #defines and #includes? You could maybe use the actual preprocessor. Otherwise I'm not a sed wizard either

  3. #3
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    What do you mean by use the actual preprocessor?

    I am trying to strip out comments because my partner would only allow me to open source our program (a chess engine) without the comments in a source file (since there is quite a bit of private discussions going on in the comments about novel ideas for the chess engine that he prefers to be kept private for now, but doesn't mind the code being open).

  4. #4
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Ah I see what you mean now. I don't think that's a good idea because I need to do it for every release... I think it's "messier" than a sed solution, too.

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    My guess would be that there are A LOT of comments that do not relate to any "new ideas", whilst a few comments do. How about working through the code and marking those comments that you don't want to publish with a marker, and then write a small application that strips only those comments?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    The discussion is actually quite scattered, throughout the whole file, because a lot of ideas are being discussed, and the comments are put where they are relevant.

  7. #7
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Not sure if this will do what you want
    Quote Originally Posted by man gcc
    -fpreprocessed
    Indicate to the preprocessor that the input file has already been
    preprocessed. This suppresses things like macro expansion,
    trigraph conversion, escaped newline splicing, and processing of
    most directives. The preprocessor still recognizes and removes
    comments, so that you can pass a file preprocessed with -C to the
    compiler without problems. In this mode the integrated
    preprocessor is little more than a tokenizer for the front ends.

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by cyberfish View Post
    The discussion is actually quite scattered, throughout the whole file, because a lot of ideas are being discussed, and the comments are put where they are relevant.
    Yes, but my point was that a large portion of comments probably do not relate to new ideas, but rather relate to what this code actually does, and thus are useful for others to help improve your code. Presumably part of your release in the open source is to gain other from peoples experience, and to let others learn from you? Completely stripping ALL comments will reduce the readability.

    Of course, it may be that your code doesn't have much in the way of comments, EXCEPT for the comments on new ideas - in which case please ignore my post here.

    I don't think pre-processing is a great idea - it will drag in all the header files that the project uses. Those header files are (at least in part) compiler and OS dependant, so whilst it will remove the comments, it will also make the code completely unportable.

    Likewise for any adaption that uses something like:
    Code:
    #if WIN32
    ...
    #else
    ...
    #endif
    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    > since there is quite a bit of private discussions going on in the comments about novel ideas for the chess engine
    I'd suggest you start a local Wiki for that, and leave the comments in the code documenting the facts rather than the ideas.

    Otherwise, as you say, it's going to get messy (and error prone with it).
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #10
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Ah I see.
    -fpreprocessed
    That kind of works, but it applies "#define", and doesn't remove "#if 0/#endif" blocks. It also seriously messes up the indentation.

    Yes, but my point was that a large portion of comments probably do not relate to new ideas, but rather relate to what this code actually does, and thus are useful for others to help improve your code. Presumably part of your release in the open source is to gain other from peoples experience, and to let others learn from you? Completely stripping ALL comments will reduce the readability.
    That is a good point, but I think the source is pretty much self-documenting. No complex algorithms and such. Just simple evaluation terms.

    I'd suggest you start a local Wiki for that, and leave the comments in the code documenting the facts rather than the ideas.
    That is good idea, except it will be quite a bit of work, so I'd rather just find something to strip out the comments.

    I suppose I can write some C++ code to do it... but would prefer just a sed command. (or I can take the time to learn sed, of course ).

  11. #11
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Seriously, do it with sed. The time you've spent looking for alternatives you could have just learnt sed and done it already

  12. #12
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I tried to learn it just now. The problem is... I don't even know regular expressions, so I have to learn that, too.

  13. #13
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    They're not that hard, plus they're fun! And very, very useful. Check out http://www.regular-expressions.info/

  14. #14
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Code:
    int main()
    {
        printf("/* Make sure you deal with this case. */");
        return 0;
    }
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    Or maybe
    Code:
    if ( 3 //**/ 3
    == 1 ) {
        printf( "C\n" );
    } else {
        printf( "C++\n" );
    }
    > That is good idea, except it will be quite a bit of work, so I'd rather just find something to strip out the comments.
    A reliable comment stripper will be quite a lot of work.

    Is your code in source control? Do you have branches and merges?
    Keeping track of where your ideas are is going to seem like hard work to me.
    Also, are you going to be checking in code with only fresh ideas added in comments?

    sed 's@//.*@@g'
    A simple approach, which will do fine for 99% of the C++ comments.
    But there are a few special cases where this is exactly the wrong thing to do, and it's those which will really tax your skill / time / patience.

    You might want to look at the doxygen approach, which tags comments.
    So comments which you want to remove, always look like this (for example).
    Code:
    /*@@IDEA
    */
    Parsing for these, and only these would be a lot easier than a generic comment stripper.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Class Inheritance over multiple source files
    By Swarvy in forum C++ Programming
    Replies: 7
    Last Post: 11-11-2008, 10:03 AM
  2. Multiple Source Files, make files, scope, include
    By thetinman in forum C++ Programming
    Replies: 13
    Last Post: 11-05-2008, 11:37 PM
  3. need help with handelling multiple source files
    By DarkMortar in forum C++ Programming
    Replies: 38
    Last Post: 05-26-2006, 10:46 PM
  4. How to implement several source files?
    By Gades in forum C Programming
    Replies: 3
    Last Post: 11-21-2001, 02:44 PM
  5. remove comments from source code
    By limbo100 in forum C Programming
    Replies: 2
    Last Post: 09-29-2001, 06:25 PM