Thread: String/token parsing

  1. #1
    Registered User
    Join Date
    Aug 2006
    Posts
    43

    Question String/token parsing

    I'm attempting to write some code that will extract certain tokens from a string of variable length. The code I have now is long and messy (loaded with switch statements, if/thens, etc), and I'm sure there's a more elegant solution. For the sake of brevity, say this is the list of valid tokens:

    Code:
    A
    AZ
    @
    :...:        //String between colons to be read as a string and not tokenized
    {0-9}    //Any single-digit integer)
    So, valid strings could be:

    Code:
    A@5AAZ
    @01:Hello, world!:@A
    AAAZ:AZ:A1
    @7@7AZ
    For the most part, the strings will be >255 characters in length, and read from a plain text file. The number of valid tokens is more likeley to be several dozen. Some tokens will be followed by a string literal. The code also needs to be able to ignore anything that isn't a valid token. Order counts.

    Are there any tricks to make this easier? Or do I really need to brute force my way through the string? As it stands now, my code is pages of this:

    Code:
    switch(input_char)
    {
    case 'A':
        if(next_char == 'Z')
        {
            //do stuff
        }
        else
        {
            //do some other stuff
        }
        break;
    //etc.
    }
    It's functional, but it's pretty ugly. And anytime I need to add/change a token, it's not easy. Any help would be greatly appreciated.

  2. #2
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    Is this an academic or a practical question?

    If 2.,

    Have a look at boost tokenizer. It does the dirty work for you.
    http://www.boost.org/libs/tokenizer/index.html

    If you need more advanced parsing, you could try boost spirit
    http://spirit.sourceforge.net/

  3. #3
    Registered User
    Join Date
    Aug 2006
    Posts
    43
    Quote Originally Posted by pheres View Post
    Is this an academic or a practical question?
    It's a little bit of each. It's academic in that I like to know how/why things are done (I'm constantly reinventing the wheel). It's practical in that my code is becoming impractical, and I need a better solution (that, and I generally find that my wheel isn't as good as whatever else is out there).

    FWIW, this is for a silly project of mine. It's not a homework question.

  4. #4
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    You could intoduce a layer of abstraction. Parsing of regular languages is the elemental job of finite state machines. You have to build one that either just accepts your language or acts according to the read tokens. Google will give you probably more references about parsing and FSMs than you want to read.
    The code for your FSM may be messy as well, but it's hidden inside a class (or class system with own classes for actions, states, transitions and so on). Or you could make the FSM data driven. For example you could describe needed states and transitions and actions (which are IDs of functions you register) in XML and build just a loader and executer inside your app. But what would be a bit of overkill for a little project.
    Last edited by pheres; 03-04-2008 at 09:55 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. need sth about parsing
    By Masterx in forum C++ Programming
    Replies: 6
    Last Post: 11-07-2008, 12:55 AM
  2. draw tree graph of yacc parsing
    By talz13 in forum C Programming
    Replies: 2
    Last Post: 07-23-2006, 01:33 AM
  3. Parsing for Dummies
    By MisterWonderful in forum C++ Programming
    Replies: 4
    Last Post: 03-08-2004, 05:31 PM
  4. Need help fixing bugs in data parsing program
    By daluu in forum C Programming
    Replies: 8
    Last Post: 03-27-2003, 06:02 PM
  5. I hate string parsing with a passion
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 2
    Last Post: 03-19-2002, 07:30 PM