Hello to all. I have created a program which does tokenization to the whole line. The basic idea is if you have this ***HELLO%%SIR.
Then you have 8 tokens (2 words + 6 symbols). My algorithm is the following :
BEGIN OF THE LOOP
IF CHARACTER IS NOT AN ALPHABETIC LETTER
INCREASE THE COUNTER OF ITEMS ONE TIME
INCREASE THE COUNTER OF ARRAY ONE TIME
END IF
OTHERWISE IF CHARACTER IS AN ALPHABETIC LETTER
INCREASE THE COUNTER OF WORDS ONE TIME
INCREASE THE COUNTER OF ARRAY TO THE LENGTH OF THE WORD
END OTHERWISE IF
IF CHARACTER IS THE LAST
BREAK FROM THE LOOP
END IF
END LOOP
PRINT ITEMS_COUNTER + WORD_COUNTER
And here is the implementation :
[C] C Words Tokenizer - Pastebin.com
First of all it is not an exercise from university. It is my reflection. What is your opinion about that exercise? It is useful? Tokens must be only the words between delimiters? I am analyzing the whole line.
Secondly I want to print the symbols and the words I don't want to use strtok to do this because I think that function line_tokenizer will not reusable due to strtok and the fact that puts the '\0' after the word each time it is called. (strtok) Is there any other idea in order to have the OUTPUT :
Any other suggestion for the program or for the algorithm will be acceptable.Code:Give the sentence: ***HELLO%%SIR. Analyzing... The line has 8 token(s). Symbols : ***%%. Words : HELLO SIR
Thank you in advance