Thread: C program tokenizer.

  1. #1
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357

    C program tokenizer.

    Hello to all. I have created a program which does tokenization to the whole line. The basic idea is if you have this ***HELLO%%SIR.

    Then you have 8 tokens (2 words + 6 symbols). My algorithm is the following :

    BEGIN OF THE LOOP

    IF CHARACTER IS NOT AN ALPHABETIC LETTER

    INCREASE THE COUNTER OF ITEMS ONE TIME
    INCREASE THE COUNTER OF ARRAY ONE TIME

    END IF

    OTHERWISE IF CHARACTER IS AN ALPHABETIC LETTER

    INCREASE THE COUNTER OF WORDS ONE TIME
    INCREASE THE COUNTER OF ARRAY TO THE LENGTH OF THE WORD

    END OTHERWISE IF

    IF CHARACTER IS THE LAST
    BREAK FROM THE LOOP
    END IF

    END LOOP

    PRINT ITEMS_COUNTER + WORD_COUNTER


    And here is the implementation :

    [C] C Words Tokenizer - Pastebin.com

    First of all it is not an exercise from university. It is my reflection. What is your opinion about that exercise? It is useful? Tokens must be only the words between delimiters? I am analyzing the whole line.

    Secondly I want to print the symbols and the words I don't want to use strtok to do this because I think that function line_tokenizer will not reusable due to strtok and the fact that puts the '\0' after the word each time it is called. (strtok) Is there any other idea in order to have the OUTPUT :

    Code:
     
    
    Give the sentence: ***HELLO%%SIR.
    
              Analyzing...
    
    The line has 8 token(s).
    
    Symbols : ***%%.
    Words : HELLO  SIR
    Any other suggestion for the program or for the algorithm will be acceptable.

    Thank you in advance

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    What about numbers (digits)?

  3. #3
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    Quote Originally Posted by Adak View Post
    What about numbers (digits)?
    So far the program manipulates the numbers as symbols.

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    So the band Blink123 becomes just "Blink", and 3 symbols??

    How about if the number is a prefix or suffix a word, then it's part of the name, else it's a symbol?

  5. #5
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    Quote Originally Posted by Adak View Post
    So the band Blink123 becomes just "Blink", and 3 symbols??

    How about if the number is a prefix or suffix a word, then it's part of the name, else it's a symbol?
    Hmmm good testing. I didn't know about this input issue. According to the documentation 123Blink and Blink123 must give the same results. I will see what goes wrong with this. :/

    May I should fix the program in order to manipulates numbers too. For example

    Code:
     Hello_How_are_you?123 
    
    OUTPUT : 11 tokens or 9 (the whole 123)
    Do you think this exercise it is useful? In order to continue with it?
    Last edited by Mr.Lnx; 11-28-2013 at 02:11 PM.

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Any little problem or puzzle that you find interesting and challenging, should be at least given a try. There isn't enough time to go through things that don't interest or challenge you.

    Make sure the "juice" is worth the "squeeze", but you need to "squeeze" things, or you'll never develop adequate hand strength in programming.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. String Tokenizer Help
    By KuuKuu in forum C Programming
    Replies: 5
    Last Post: 01-21-2013, 04:16 PM
  2. Is this a bug of boost::tokenizer ?
    By meili100 in forum C++ Programming
    Replies: 2
    Last Post: 03-14-2008, 08:20 PM
  3. C++ String Tokenizer
    By Annorax in forum Game Programming
    Replies: 10
    Last Post: 07-13-2005, 10:41 AM
  4. Tokenizer in C
    By Tarik in forum C Programming
    Replies: 21
    Last Post: 08-26-2004, 06:36 AM
  5. Tokenizer
    By PJYelton in forum C++ Programming
    Replies: 2
    Last Post: 01-29-2003, 03:01 PM