This is a discussion on [C] remove comments within the C Programming forums, part of the General Programming Boards category; Originally Posted by laserlight Besides automatic concatenation, a string literal can span multiple lines in the source code by using ...
That is automatic concatenation (of adjacent string literals). I am talking about a string literal spanning multiple lines in the physical source code, where the multiple lines become a single line.Originally Posted by quzah
C + C++ Compiler: MinGW port of GCC
Version Control System: Bazaar
Look up a C++ Reference and learn How To Ask Questions The Smart Way
Yes, thats where i need your help. I know i need 4 states: what i dont know are the conditions of switching between them.What you need to do is come up with a list of test cases that you want to support. The list should include all the complex possibility with single quotes, double quotes, multi line comments and any other possibilities you can think of. Do you need to worry about ??/ used in place of \? What about C++/C99 single line comments? If yes, write a few examples.
I'm going to work with ANSI C Standard. So no stuff like //, it seems illogical anyway - i'm checking a C program, not a C++ program or anything else.
So i want to support everything i can:
1. If i'm in state double_quote, and a backlash appears, any other characters followed by the backlash (except \n) will be ignored and state remains double quote after that.
So i need states: double_quote, dquote_escape
2. If i'm in state double_quote
This is an error aswell (not really an error, just a warning for changing states). if(state==dquote && c=='\n') state = normal;Code:printf("test test");
So i think i covered every possiblity for dquote state this way. Is there any other? I cant think of anything else.
As for single_quote state, i still have no idea what to do about it - really need some advice here.
Last edited by Tool; 11-15-2009 at 06:14 AM.
The standard which doesnt allows //.
May i get some serious answers now?
Last edited by Tool; 11-15-2009 at 06:24 AM.
Did you actually have a question? You've been given a bunch of hints and suggestions without actually posting any code, so I'm not sure what you're expecting here. Especially since you aren't actually asking questions.
Quzah.
Hope is the first step on the road to disappointment.
Im looking for a confirmation about double_quote state as i wrote 2 posts ago.
My question is: Did i miss a possible change from dquote state? Aside from the 2 possibilities i wrote 2 posts ago?
I cant start writing code if i dont know what the exercise asks for.
My 2nd question is, on which i didnt get any answer is: How am i supposed to treat single_quotes in this exercise. I dont understand this part at all. You cant open a comment inside a ' ' so, my question is what did the authors refer to when they said, "Dont forget to handle chracter constants properly".
This is the code im planing to use. I'm just going to take it from the exercise 1-24 and modify it a bit. Where i get boggled is, i dont know what to do about single quotes in this exercise.
Code:int c; enum states { normal, comment_entry, comment, comment_exit, squote, squote_escape, squote2, squote3, dquote, dquote_escape }; ... while((c=getchar())!=EOF) { if(state==normal && c=='/') { state = comment_entry; } else if(state==normal && c=='"') { state = dquote; } else if(state==normal && c=='\'') { state = squote; } else if(state==normal) { if((check(c))==0) {printf("Line %d: Mismatching %c\n", line, c); valid=0;} } else if(state==comment_entry && c=='*') { state = comment; } else if(state==comment_entry && c=='/') { } else if(state==comment_entry) { state = normal; if((check(c))==0) {printf("Line %d: Mismatching %c\n", line, c); valid=0; } } /* neccesary to check braces here aswell */ else if(state==comment && c=='*') { state = comment_exit; } else if(state==comment) {} else if(state==comment_exit && c=='*') {} else if(state==comment_exit && c=='/') { state = normal; } else if(state==comment_exit) { state = comment; } else if(state==dquote && c=='"') { state = normal; } else if(state==dquote && c=='\n') { state = normal; printf("Line %d: Double quotes not properly closed.\n", line); valid=0; } else if(state==dquote && c=='\\') { state = dquote_escape; } else if(state==dquote) { } else if(state==dquote_escape) { state = dquote; } }
Last edited by Tool; 11-15-2009 at 06:56 AM.
Ok, well i've came up with a code, but it only works with double_quote, comment and normal state. I didnt include character constants cause i dont know how to treat them.
It reads the program into array, and processes the data with a while loop.
Code:#include <stdio.h> #include <stdlib.h> #define MAX_SIZE 1024 int main(int argc, char *argv[]) { int c; int x=0; char array[MAX_SIZE]; enum states { normal, comment, dquote, dquote_escape }; int state = normal; /* FILE *f; f = fopen("zad88.c", "r"); if (f == NULL) return 1; while((c=fgetc(f))!=EOF) */ while((c=getchar())!=EOF) if(x<MAX_SIZE) array[x++]=c; else return 1; array[x]='\n'; array[++x]='\0'; int pom=x; x=0; while(x!=pom) { if(array[x]=='/' && array[x+1] == '*' && state == normal) { array[x]=' '; state = comment; x++;} else if(array[x]=='*' && array[x+1] == '/' && state == comment) { array[x]=array[x+1]=' '; state = normal; x++; } else if(array[x]=='"' && state == normal) { state = dquote; } else if(array[x]=='\\' && state == dquote) { state = dquote_escape; } else if(state==dquote_escape) { state = dquote; } else if(state == dquote && array[x]=='\n') { state = normal; } else if(array[x]=='"' && state == dquote) { state = normal; } if(state == comment) array[x]=' '; x++; } printf("%s\n", array); printf("Press any key to continue.\n"); getchar(); return 0; }
You should seperate the next state logic for each state, for instance:
It make it easier to follow.Code:switch(state){ case normal: { if(array[x]=='/' && array[x+1] == '*') state=comment; break; } case comment: { if(array[x]=='*' && array[x+1] == '/') state=normal; break; } }
Single quote character literals will also be a state I think. They should work just like double quote, except that they begin and end with single quotes.
And are you going to worry about the ??/ trigraph? Eg:
Code:/*the bellow does not start a comment.*/ "??/"/*";
Last edited by King Mir; 11-15-2009 at 05:12 PM.
It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
If this is a prelude to studying compilers, you might want to:
1 replace "\r\n" sequences with "\n", and replace"\r" by it itself with "\n"
2 merge continued lines. Be aware that the backslash and newline may be separated by an arbitrary amount of white space.
3 Replace trigraphs, if you like.
4 Only then, strip comments. Or rather, replace them by a single space. Newline for BCPL (or CPP) comments makes for neater preprocessor output.
You are aware of string literals. Treat character literals exactly as string literals. For example, do not strip comments from these lines (there are no comments):
The latter may be a syntactic error, but it is a valid token, and you should not care about it (yet).Code:x = "/* blah */"; y = '/* blah */';
When you are done worrying about terminators, continued lines, trigraphs and string literals, you may also want to correctly handle this:
Which is a valid path.Code:# include <uh/*this_is_not_a_comment*/oh.h>
Yeah, bummer, you need to parse preprocessor tokens as well.
Nobody said it was going to be easy...
[edit]
Dammit, I just found this:
And in a footnote:If the characters ', \, ", //, or /* occur in the sequence between the < and > delimiters, the behavior is undefined. Similarly, if the characters ', \, //, or /* occur in the sequence between the " delimiters, the behavior is undefined.
This implies that my example is *NOT* valid as far as C source goes, and you are free to treat it however you like. I am going to have to edit this entire post.Thus, sequences of characters that resemble escape sequences cause undefined behaviour.
The quote is from §6.4.7.3 of
http://www.open-std.org/jtc1/sc22/wg...docs/n1336.pdf
The translation phases are described in §5.1.1.2.
[edit]:
It is summarised (in ascii art for the win) here:
Poster with the 8 phases of translation in the C language - Stack Overflow
[edit]:
I noted that it says "Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines."
You will find that some compilers allow for whitespace between the two (a mistake easily made). I am thinking GCC.
When you are done replacing comments, you can continue with the other bullets. Let us see what you come up with. Especially when you have finished the compiler ;-).
No, really, I should like to comment as you go along, and be your designated beta tester ;-)
[another edit]:
Okay I see that in your previous source, you just overwrite comments with spaces and then dump the entire array. While that is of course a (pragmatic) way to do it, you could operate on a smaller "window" of the source file and generate output as you go (skipping comments). Instead of allocating memory (which you do statically, on the stack, imposing limits), you can use file offsets for recalling previous locations in the source file. I know, your focus is elsewhere, I am just suggesting... If a thing is worth doing, it is worth doing properly (someone said).
--
Aksel
Last edited by Aksel; 11-17-2009 at 01:29 AM.
The idea is that usually, lines in the physical source code correspond to lines in the logical source code, but it is possible to continue a logical line to the next physical line by "escaping" the new line at the end of the physical line. Askel's observation is that some compilers do not conform to the standard strictly by allowing the "escaping" to have whitespace in between. Either way, you would not be removing the newline at the end of the logical line, but you will remove the backslash and newline that forms the continuation.Originally Posted by Tool
They are certain sequences of three characters that are to be replaced by a single character. In King Mir's example, the sequence ??/ would be replaced by a single \ character.Originally Posted by Tool
You can interpret it as having the same effect as \n. To find out more, search on line endings.Originally Posted by Tool
C + C++ Compiler: MinGW port of GCC
Version Control System: Bazaar
Look up a C++ Reference and learn How To Ask Questions The Smart Way
About trigraphs. How can i use them in practice?
In my compiler, if i write for example ??< as opening the { brace, i get an error while trying to compile it:
5:1 C:\Users\SUZI\Desktop\C Smeće\zad97.c [Warning] trigraph ??< ignored, use -trigraphs to enable
Last edited by Tool; 11-18-2009 at 11:27 AM.
As the warning suggested, pass -trigraphs as an option when invoking your compiler.
C + C++ Compiler: MinGW port of GCC
Version Control System: Bazaar
Look up a C++ Reference and learn How To Ask Questions The Smart Way
Just to be clear:
i assume that trigraphs only work when inside normal state (not inside comment, a character constant or a quoted string), correct?
They work everywhere. The replacement is done in a phase before tokenisation, so at that point comments, character constants and string literals in the source have not been identified.Originally Posted by Tool
C + C++ Compiler: MinGW port of GCC
Version Control System: Bazaar
Look up a C++ Reference and learn How To Ask Questions The Smart Way