-
identifying string input
How can i count the number of symbols found in C-code in filex.c ??
eg:
Code:
while 1
( 3
) 3
} 1
{ 1
; 2
!= 1
++ 1
identifiers 8 (c, getchar, EOF, sentence, i, etc)
= 3
[ 2
] 2
char const 1 ('\0')
Code:
//filex.c
while(( c = getchar()) != EOF){
sentence[i++] = c;
}
sentence[i] = '\0';
With the code bellow, i have split the c-code into tokens which will be
Code:
while((
c
=
getchar())
!=
EOF){
sentence[i++]
=
c;
}
sentence[i]
=
'\0';
From here how can i go further to identify what is the content of the token ??
Code:
fscanf(cfPtr, "%s", xyz);
while(!feof(cfPtr)){
processinfo(xyz);
fscanf(cfPtr, "%s", xyz);
}
fclose(cfPtr);
}
-
What you want to do is called lexical analysis. To do it properly you should be able to recognize the difference between identifiers, keywords, operators, separators, comments and constants. Because most of these rarely change, you can create lookup tables for most of them. The only difficult part about this is that some tokens are build using other tokens, yet do not perform the same function. So your lexical analyzer must support the "maximal munch" feature that C compilers use. That basically means that you should accept the longest valid token before moving on to the next one. All of this is easiest with single character input and careful calls to ungetc.