Command to strip out comments in source files

This is a discussion on Command to strip out comments in source files within the Tech Board forums, part of the Community Boards category; Code: printf("/* Make sure you deal with this case. */"); Ah, that I didn't think of. The code is not ...

  1. #16
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    Code:
        printf("/* Make sure you deal with this case. */");
    Ah, that I didn't think of.

    The code is not in source control.

    Code:
    sed 's@//.*@@g'
    That takes out the "//" comments, but I need to take out the "/* ... */" ones, too, which I heard is more difficult (multi-lines).

    I am thinking about just writing a simple one myself in C++, as it doesn't really need to be fool proof. As long as it works for my code... .

  2. #17
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    My ghetto solution, with a consecutive-blank-lines-stripper -

    Code:
    #include <iostream>
    #include <fstream>
    #include <string>
    
    #include <cassert>
    #include <cctype>
    
    const int max_length = 1024*1024*10; //10 MB
    
    int main(int argc, char **argv) {
    	int file_length = 0;
    	char *buffer = new char[max_length]; 
    	std::string out_data;
    	
    	std::ifstream in_file(argv[1], std::ios::in | std::ios::binary);
    	
    	if (!in_file) {
    		std::cerr << "Error opening file: " << argv[1] << std::endl;
    		return 1;
    	}
    	
    	if (in_file.read(buffer, max_length-1)) { //if the read was successful without hitting an EOF...
    		std::cerr << "File too big." << std::endl;
    		return 1;
    	}
    	
    	file_length = in_file.gcount();
    	
    	buffer[file_length] = 0;
    	
    	std::string in_data(buffer); //make it a string so we have nice functions to play with
    	
    	//first replace every "\r" by space
    	for (int i = 0; i < in_data.size(); ++i) {
    		if (in_data[i] == '\r') {
    			in_data[i] = ' ';
    		}
    	}
    	
    	int state = 0; //0 = normal, 1 = ignore until end of line, 2 = ignore until "*/", 3 = ignore until "#endif"
    	
    	int skip_next = 0;
    	
    	for (int i = 0; i < in_data.size(); ++i) {
    		
    		if (skip_next) {
    			--skip_next;
    			continue;
    		}
    		
    		switch(state) {
    			case 0:
    				if ((i + 2 <= in_data.size()) && (in_data.substr(i, 2) == "//")) {
    					state = 1;
    				} else if ((i + 2 <= in_data.size()) && (in_data.substr(i, 2) == "/*")) {
    					state = 2;
    				} else if ((i + 5 <= in_data.size()) && (in_data.substr(i, 5) == "#if 0")) {
    					state = 3;
    				} else {
    					out_data.push_back(in_data[i]);
    				}
    				break;
    			case 1:
    				if (in_data[i] == '\n') {
    					out_data.push_back('\n');
    					state = 0;
    				}
    				break;
    			case 2: 
    				if ((i + 2 <= in_data.size()) && (in_data.substr(i, 2) == "*/")) {
    					out_data.push_back('\n');
    					state = 0;
    					skip_next = 1;
    				}
    				break;
    			case 3: 
    				if ((i + 6 <= in_data.size()) && (in_data.substr(i, 6) == "#endif")) {
    					out_data.push_back('\n');
    					state = 0;
    					skip_next = 5;
    				}
    				break;
    			default:
    				assert(false);
    		}
    		
    	}
    	
    	int n_count = 0;
    	for (int i = 0; i < out_data.size(); ++i) {
    		if (n_count > 1 && (out_data[i] == '\n' || out_data[i] == '\t' || out_data[i] == ' ')) {
    			continue;
    		} else {
    			if (out_data[i] != '\t' && out_data[i] != '\n' && out_data[i] != ' ') {
    				if (n_count > 1) { //get all the junk we threw away back!
    					int j = i;
    					while (out_data[j] != '\n') {
    						--j;
    					}
    					for (++j; j < i; ++j) {
    						std::cout << out_data[j];
    					}
    				}
    				n_count = 0;
    			}
    			if (out_data[i] == '\n') {
    				++n_count;
    			}
    			std::cout << out_data[i];
    		}
    	}
    }
    It's unbreakable until you try to break it .

    EDIT: code edited for the consecutive-blank-lines-stripper.
    Last edited by cyberfish; 11-12-2008 at 10:42 PM.

  3. #18
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,048
    I've never put it to the test, but I have a program called rmccmt on my Linux machine ("remove c/c++ comments").

    I do have a program which calculates how much source code consists of comments. I tested it pretty thoroughly, and I could post it here if you wanted, but I suspect you'd be able to figure it out yourself without too much difficulty.

    @cyberfish: About your code: consider looking at isspace().
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  4. #19
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,270
    Quote Originally Posted by dwks View Post
    @cyberfish: About your code: consider looking at isspace().
    isspace() is locale-specific, but the language itself isn't. I think it's better to explicitly check for the characters which are declared by the standard to be "whitespace."
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  5. #20
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    I've never put it to the test, but I have a program called rmccmt on my Linux machine ("remove c/c++ comments").
    That is strange. I have Ubuntu 8.04 and don't have it. Not even in the APT repository.

    *edit* ah nvm, it's in liwc */edit*

    *edit2* It messes up indentations, too, for some reason, and doesn't catch "#if 0/#endif" Guess I will just use mine . */edit2*
    Last edited by cyberfish; 11-13-2008 at 02:52 PM.

Page 2 of 2 FirstFirst 12
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Class Inheritance over multiple source files
    By Swarvy in forum C++ Programming
    Replies: 7
    Last Post: 11-11-2008, 10:03 AM
  2. Multiple Source Files, make files, scope, include
    By thetinman in forum C++ Programming
    Replies: 13
    Last Post: 11-05-2008, 11:37 PM
  3. need help with handelling multiple source files
    By DarkMortar in forum C++ Programming
    Replies: 38
    Last Post: 05-26-2006, 11:46 PM
  4. How to implement several source files?
    By Gades in forum C Programming
    Replies: 3
    Last Post: 11-21-2001, 02:44 PM
  5. remove comments from source code
    By limbo100 in forum C Programming
    Replies: 2
    Last Post: 09-29-2001, 07:25 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21