Thread: Parsing for Dummies

  1. #1
    Registered User
    Join Date
    Apr 2003
    Posts
    17

    Question Parsing for Dummies

    Okay— yes, I have checked every resource of cprogramming.com, and yet I am still confused about the concept of string parsing, and/or separating strings into tokens. It isn't that there isn't enough information; I just haven't found any info that's "dumbed-down" enough for someone who is completely new to parsing.

    So I know what parsing is.
    But what I'm wondering is, what are the rudimentary basics of parsing and string separation, in terms of programming, and how they work logically?

    Many thanks, I know that this is a bit vague, I tried to focus my question as well as I could.
    "None are more hopelessly enslaved than those who falsely believe they are free."
    -Goethe

    [paraphrased] "Those who would sacrifice essential liberties for a little temporary safety deserve neither."
    -Benjamin Franklin

  2. #2
    Registered User
    Join Date
    Jan 2004
    Posts
    33
    It depends on what your looking for. If you want to take the string "This is a string. This is another part of this string." And separate it into two strings (one for each sentance) you would iterate until you found a period. If you wanted to parse all words out of a string into multiple strings you could use pointers to iterate until they found a non alpha-numerical character, take the difference between the two pointers, and copy the data into a new string.
    There are multiple ways of parsing strings, and all of them rely on some form of iteration.
    “Focused, hard work is the real key to success. Keep your eyes on the goal, and just keep taking the next step towards completing it. " -John Carmack

  3. #3
    Registered User
    Join Date
    Mar 2002
    Posts
    1,595
    In my mind parsing involves taking a given original string and looking for a given set of targets--which may be substrings or single characters. The full string is either physically or logically subdivided into resultant substrings (aka tokens) using some protocol if a target is found.

  4. #4
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Here's an example of two very rudimentary parsing implementations to illustrate the basic concept. I hope it helps.

    Both use a pointer to a pointer to a buffer (ie: a char**) to keep track of the current position. The first example simply reads in tokens that are separated by one or more of a certain character (usually a space).


    Code:
    /*
      return values:
         0 : success
        -1 : done
       > 0 : buffer too small, need this
             many bytes for next token.
    */
     int token(
      char * buffer, 
      int max, 
      char ** next, 
      char sep)
    {
     const char * ptr, * start = NULL;
      
         for(ptr = *next; *ptr; ++ptr)
        {
             if(*ptr != sep)
            {
                 if(start == NULL)
                {
                 start = ptr; // first char in token
                } 
            }
             else 
            { 
                 if(start != NULL)
                {
                 break; // ready to copy
                } 
            }
        }
    
        if(start == NULL)
       {
        return -1; // done
       }  
    
     int lcopy = ptr-start;
                
         if(lcopy > max - 1) // max - 1 for null-term
        {
         return lcopy + 1; // need a buffer this big
        }
    
     strncpy(buffer, start, lcopy);
                     
     buffer[lcopy] = 0;                
                     
     *next = (char*)ptr;
                     
     return 0;
    }

    It's usage would be like this:


    Code:
     int main()
    {
     const int bufsz = 1024;
     
     char buf[bufsz]; 
    
     char data[] = "This    data...needs   to be  parsed  ";
    
     char * iter = data;
     
         while(0 == token(buf, bufsz, &iter, ' '))
        {
         printf("Token: '%s'.\n", buf);
        }
    
     return 0;
    }

    The next one takes a slightly different approach by skipping over anything that doesn't match a certain 'token-type' string. It uses two helper functions, find_first_of and find_first_not_of in order to accomplish that.


    Code:
     const char * find_first_of(
      const char * str, 
      const char * find)
    {
     const char * ptr;
     
         for( ; *str; ++str)
        {
             for(ptr = find; *ptr; ++ptr)
            {
                 if(*str == *ptr)
                {
                 return str;
                } 
            }
        }
             
     return NULL;
    }
    
    
     const char * find_first_not_of(
      const char * str, 
      const char * find)
    {
     const char * ptr = find;
    
     bool found;
     
         for( ; *str; ++str)
        {
         found = false;
        
             for(ptr = find; *ptr; ++ptr)
            {
                 if(*str == *ptr)
                {
                 found = true;
                } 
            }
            
            if(!found)
           { 
            return str;
           } 
        }
        
     return NULL;
    }
    
    
    /*
      return values:
         0 : success
        -1 : done
       > 0 : buffer too small, need this
             many bytes for next token.
    */
     int token(
      char * buffer, 
      int max, 
      char ** next, 
      const char * match)
    {
     const char * start = find_first_of(*next, match);
     
         if(start == NULL)
        {
         return -1;
        } 
     
     const char * ptr = find_first_not_of(start, match);
     
         if(ptr == NULL)
        {
         ptr = &start[strlen(start)-1];
        }   
                  
     int lcopy = ptr-start;
                
         if(lcopy > max - 1) 
        {
         return lcopy + 1; 
        }
    
     strncpy(buffer, start, lcopy);
                     
     buffer[lcopy] = 0;                
                     
     *next = (char*)++ptr;
                     
     return 0;
    }

    The usage for that would be:


    Code:
     int main()
    {
     const int bufsz = 1024;
     
     char buf[bufsz]; 
    
     const char * accept = "abcdefghijklmnopq"
                           "rstuvwxyzABCDEFGH"
                           "IJKLMNOPQRSTUVWXYZ";
    
     char data[] = "This    data...needs   to be  parsed  ";
     
     char * iter = data;
     
         while(0 == token(buf, bufsz, &iter, accept))
        {
         printf("Token: '%s'.\n", buf);
        }
    
     return 0;
    }
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  5. #5
    Registered User
    Join Date
    Apr 2003
    Posts
    17
    Woah... thanks... I'm going to have to sit down with a cup of coffee and begin attempting to "translate" all of that in my mind... haha...thanks.
    "None are more hopelessly enslaved than those who falsely believe they are free."
    -Goethe

    [paraphrased] "Those who would sacrifice essential liberties for a little temporary safety deserve neither."
    -Benjamin Franklin

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. need sth about parsing
    By Masterx in forum C++ Programming
    Replies: 6
    Last Post: 11-07-2008, 12:55 AM
  2. added start menu crashes game
    By avgprogamerjoe in forum Game Programming
    Replies: 6
    Last Post: 08-29-2007, 01:30 PM
  3. draw tree graph of yacc parsing
    By talz13 in forum C Programming
    Replies: 2
    Last Post: 07-23-2006, 01:33 AM
  4. Need help fixing bugs in data parsing program
    By daluu in forum C Programming
    Replies: 8
    Last Post: 03-27-2003, 06:02 PM
  5. I hate string parsing with a passion
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 2
    Last Post: 03-19-2002, 07:30 PM