Thread: Removing comments from textual files

  1. #1
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715

    Removing comments from textual files

    Hello,
    my friend ask me to write simple program that open file and removes all commented lines.
    Here's what I made:
    Code:
    #include <iostream>
    #include <fstream>
    #include <string>
    
    using namespace std;
    
    int main ()
    {
        ifstream infile("Test.txt");
        ofstream outfile("Test2.txt");
        string str, search_start="/*", search_end = "*/";
        string :: size_type siz;
        bool comment = false;//flag that indicates if line is commented
        //read line
        while (getline(infile, str))
        {
            //write line to file if there is no markers /*, */ or line is not commented
            if (str.find("/*") == string :: npos && str.find("*/") == string :: npos && !comment)
            {
                outfile << str << endl;
            }
            else //line is commented
            {
                comment = true;
                siz = str.find("*/");//try to find end of comment
                if (siz != string :: npos)
                {
                    //close comment part and continu, read next line
                    comment = false;
                    continue;
                }
                else
                {
                    //read lines until end of comment marek is found 
                    while (siz != string :: npos)
                    {
                        getline(infile, str);
                        siz = str.find("*/");
                        comment = false;
                    }
                }
            }
        }
        return 0;
    }
    This code does what it's supposed to do, but I wonder if this could be written more elegant.
    i think maybe better solution will be to use temp file, if everything goes OK contents of temp file will overwrite original file. ofcourse error checking should include unfinished comment in order to prevent lose of code if there is mistake in comments.
    Tell me what you think and perhaps suggest better solution.

    Thanks
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  2. #2
    The superhaterodyne twomers's Avatar
    Join Date
    Dec 2005
    Location
    Ireland
    Posts
    2,273
    I'd say, instead of using a temp file, why not use a temp string? That way, when everything is saved onto it, you can just save it to the same place as where the file came from. But, I'd have a small menu to ask whether or not they want to overwrite or not first.

    Code:
    ifstream in  ( "file1.txt" );
    
    string 	line = "",
    	code = "";
    
    while ( getline( in, line ) )
    {
    	if ( "//" isn't found ) code += line;
    	else if ( /* isn't found ) code += line;
    	else if ( /* is found, but half ways through line ) 
    	{	
    		code += line.substr( whereever );
    
    		while ( getline( in, line ) )
    		{
    			look for a */, and break;
    		}
    	}
    	code += "\n";
    }
    
    in.close();
    
    ofstream out ( "file1.txt" );
    ERROR CHECKING
    out<< string;
    out.close()
    or something like that anyways.

  3. #3
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715
    Thanks, interesting idea, I need to handle case if comment is in same line with code, for example
    Code:
    cout<<x;/*comment*/
    I'll post second version soon.
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  4. #4
    Registered User
    Join Date
    Aug 2005
    Posts
    1,267
    it will also need to handle comments that span two or more lines
    Code:
    /*
    comments here
    */

  5. #5
    The larch
    Join Date
    May 2006
    Posts
    3,573
    You should also be prepared for cases such as
    Code:
    /*comment
    //comment*/
    int main()
    I think it shows you must handle /* and */ first (using string find method - if you find the beginning of a comment, look for the end sign and delete everything in between, repeat until string contains no more /* substrings.) Only then can you start looking for "//"-comments, but this time you'll erase everything up to the next line ending.

    Otherwise, as the example shows you might accidentally remove the end-of-comment sign.

  6. #6
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    You also need to handle comments within strings - they're not comments.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  7. #7
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715
    Hehe, at first glance trivial task now becoming more and more interesting
    Last edited by Micko; 08-07-2006 at 02:01 PM.
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  8. #8
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715
    This is newer version:
    Code:
    #include <iostream>
    #include <string> 
    #include <fstream>
    #include <sstream>
    
    using namespace std;
    
    int main()
    {
    	ifstream infile("test.txt");
    	stringstream sstr;
    	sstr<< infile.rdbuf();
    	string :: size_type start_CPP, start_C, end;
    	string str = sstr.str();
    	start_CPP = str.find("//");
    	start_C = str.find("/*");
    
    	while (start_CPP != string :: npos || start_C != string :: npos)
    	{
    		if (start_CPP != string :: npos)
    		{
    			str.erase(start_CPP, str.find("\n", start_CPP+2)-start_CPP);
    		}
    		end = str.find("*/");
    
    		if (end != string :: npos)
    		{
    			str.erase(start_C, end+2-start_C);
    		}
    		start_C = str.find("/*");
    		start_CPP = str.find("//");
    	}
    	
    	cout <<str;
    	ofstream outfile("res.txt");
    	outfile << str<<endl;
    	return 0;
    }
    However I need to think how to parse string if /* is in regular text and not represents comment although it will be rare.
    Can you suggest something?
    Thanks
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  9. #9
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    One way is to keep a bool flag. Whenever a double quote is found the flag is set. When a second double quote is found the flag is reset. Thus a true state means inside a quoted text, while a false state means outside.

    However, some rules have to be set:

    - You will only set or reset the flag if the double quote is preceded by \ (i.e. the double quote is an escape sequence)

    - You will not set or reset the flag if you are aready inside a block of commented text.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  10. #10
    The superhaterodyne twomers's Avatar
    Join Date
    Dec 2005
    Location
    Ireland
    Posts
    2,273
    I made a cboard-color-coder program before, and I had the same problem. I can't remember how I overcame it, but I know I used a LOT of trial and error in solving it. I think what I actually did was put a [ COLOR="RED" ] before an opening ", and a [ /COLOR ] after the closing ", before doing anything else to the string. Then it was just a matter of checking whether or not the offending /* or */ or // etc was within the bounds of the color. You could assign special characters for it, and do what I did, then delete all instnaces of those special characters ... don't forget to check for /"'s too

  11. #11
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Best way is a full FSM, a finite state machine. It would have the states normal, in-string, string-escape, in-slcomment, slcomment-escape and in-mlcomment. This is your state transition table:
    normal + " -> in-string
    normal + // -> in-slcomment
    normal + /* -> in-mlcomment
    in-string + \ -> string-escape
    string-escape + anything -> in-string
    in-string + " -> normal
    in-slcomment + \ -> slcomment-escape
    slcomment-escape + anything -> in-slcomment
    in-slcomment + newline -> normal
    in-mlcomment + */ -> normal
    normal or in-slcomment + EOF -> done (but emit a warning about missing newline before EOF if in-slcomment, it's a convention to have one in *nix)
    string-escape or slcomment-escape or in-string or in-mlcomment + EOF -> error

    The -escape states are because of strings like "Hello, \"Peter\"!" and because a single-line comment can be continued to the next line with a backslash. (Although this is very bad practice - you might want to emit a warning.)
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  12. #12
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715
    Hmm, thanks, I guess I'll need to read more about finite state machines. I'm not really familliar with it and need to learn how they're programmed.
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  13. #13
    Registered User
    Join Date
    Mar 2006
    Posts
    725
    To simplify things you can add a newline to the end of the file, so you won't need to check for EOF when parsing comments. Remove the newline when you are done.

    Don't forget the unstandard combo char '/**/'
    Code:
    #include <stdio.h>
    
    void J(char*a){int f,i=0,c='1';for(;a[i]!='0';++i)if(i==81){
    puts(a);return;}for(;c<='9';++c){for(f=0;f<9;++f)if(a[i-i%27+i%9
    /3*3+f/3*9+f%3]==c||a[i%9+f*9]==c||a[i-i%9+f]==c)goto e;a[i]=c;J(a);a[i]
    ='0';e:;}}int main(int c,char**v){int t=0;if(c>1){for(;v[1][
    t];++t);if(t==81){J(v[1]);return 0;}}puts("sudoku [0-9]{81}");return 1;}

  14. #14
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Oh yeah, and the trigraph ??/ is the same as a single \, so you might want to check for that too
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  15. #15
    Ethernal Noob
    Join Date
    Nov 2001
    Posts
    1,901
    Maybe you can specify a string for "c_comm" "cpp_comm" and "end", so that you process the file line by line. If you find "cpp_comm" by way of search in algorithms, then you continue to the next line. If you find c_comm (you can make one for the beginning or the end), then you won't copy any strings until after you find */)

    or you can search for a single character like '\/', have an inside if else. if the character after it is '\/', then exclude the whole line, if it's "*", then just keep searching for '*' and a '\/' after it.
    Last edited by indigo0086; 08-08-2006 at 09:35 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Create Copies of Files
    By Kanshu in forum C++ Programming
    Replies: 13
    Last Post: 05-09-2009, 07:53 AM
  2. Reading .dat files from a folder in current directory...
    By porsche911nfs in forum C++ Programming
    Replies: 7
    Last Post: 04-04-2009, 09:52 PM
  3. *.cpp and *.h files understanding
    By ElastoManiac in forum C++ Programming
    Replies: 4
    Last Post: 06-11-2006, 04:45 AM
  4. Linking header files, Source files and main program(Accel. C++)
    By Daniel Primed in forum C++ Programming
    Replies: 3
    Last Post: 01-17-2006, 11:46 AM
  5. Multiple Cpp Files
    By w4ck0z in forum C++ Programming
    Replies: 5
    Last Post: 11-14-2005, 02:41 PM