C Board  

Go Back   C Board > General Programming Boards > C++ Programming

Reply
 
LinkBack Thread Tools Display Modes
Old 03-07-2004, 01:13 AM   #1
Registered User
 
Join Date: Apr 2003
Posts: 17
Question Parsing for Dummies

Okay— yes, I have checked every resource of cprogramming.com, and yet I am still confused about the concept of string parsing, and/or separating strings into tokens. It isn't that there isn't enough information; I just haven't found any info that's "dumbed-down" enough for someone who is completely new to parsing.

So I know what parsing is.
But what I'm wondering is, what are the rudimentary basics of parsing and string separation, in terms of programming, and how they work logically?

Many thanks, I know that this is a bit vague, I tried to focus my question as well as I could.
__________________
"None are more hopelessly enslaved than those who falsely believe they are free."
-Goethe

[paraphrased] "Those who would sacrifice essential liberties for a little temporary safety deserve neither."
-Benjamin Franklin
MisterWonderful is offline   Reply With Quote
Old 03-07-2004, 01:29 AM   #2
Registered User
 
Join Date: Jan 2004
Posts: 33
It depends on what your looking for. If you want to take the string "This is a string. This is another part of this string." And separate it into two strings (one for each sentance) you would iterate until you found a period. If you wanted to parse all words out of a string into multiple strings you could use pointers to iterate until they found a non alpha-numerical character, take the difference between the two pointers, and copy the data into a new string.
There are multiple ways of parsing strings, and all of them rely on some form of iteration.
__________________
“Focused, hard work is the real key to success. Keep your eyes on the goal, and just keep taking the next step towards completing it. " -John Carmack
/Muad'Dib\ is offline   Reply With Quote
Old 03-07-2004, 09:51 AM   #3
Registered User
 
Join Date: Mar 2002
Posts: 1,595
In my mind parsing involves taking a given original string and looking for a given set of targets--which may be substrings or single characters. The full string is either physically or logically subdivided into resultant substrings (aka tokens) using some protocol if a target is found.
elad is offline   Reply With Quote
Old 03-07-2004, 02:45 PM   #4
Guest
 
Sebastiani's Avatar
 
Join Date: Aug 2001
Posts: 4,923
Here's an example of two very rudimentary parsing implementations to illustrate the basic concept. I hope it helps.

Both use a pointer to a pointer to a buffer (ie: a char**) to keep track of the current position. The first example simply reads in tokens that are separated by one or more of a certain character (usually a space).


Code:
/*
  return values:
     0 : success
    -1 : done
   > 0 : buffer too small, need this
         many bytes for next token.
*/
 int token(
  char * buffer, 
  int max, 
  char ** next, 
  char sep)
{
 const char * ptr, * start = NULL;
  
     for(ptr = *next; *ptr; ++ptr)
    {
         if(*ptr != sep)
        {
             if(start == NULL)
            {
             start = ptr; // first char in token
            } 
        }
         else 
        { 
             if(start != NULL)
            {
             break; // ready to copy
            } 
        }
    }

    if(start == NULL)
   {
    return -1; // done
   }  

 int lcopy = ptr-start;
            
     if(lcopy > max - 1) // max - 1 for null-term
    {
     return lcopy + 1; // need a buffer this big
    }

 strncpy(buffer, start, lcopy);
                 
 buffer[lcopy] = 0;                
                 
 *next = (char*)ptr;
                 
 return 0;
}

It's usage would be like this:


Code:
 int main()
{
 const int bufsz = 1024;
 
 char buf[bufsz]; 

 char data[] = "This    data...needs   to be  parsed  ";

 char * iter = data;
 
     while(0 == token(buf, bufsz, &iter, ' '))
    {
     printf("Token: '%s'.\n", buf);
    }

 return 0;
}

The next one takes a slightly different approach by skipping over anything that doesn't match a certain 'token-type' string. It uses two helper functions, find_first_of and find_first_not_of in order to accomplish that.


Code:
 const char * find_first_of(
  const char * str, 
  const char * find)
{
 const char * ptr;
 
     for( ; *str; ++str)
    {
         for(ptr = find; *ptr; ++ptr)
        {
             if(*str == *ptr)
            {
             return str;
            } 
        }
    }
         
 return NULL;
}


 const char * find_first_not_of(
  const char * str, 
  const char * find)
{
 const char * ptr = find;

 bool found;
 
     for( ; *str; ++str)
    {
     found = false;
    
         for(ptr = find; *ptr; ++ptr)
        {
             if(*str == *ptr)
            {
             found = true;
            } 
        }
        
        if(!found)
       { 
        return str;
       } 
    }
    
 return NULL;
}


/*
  return values:
     0 : success
    -1 : done
   > 0 : buffer too small, need this
         many bytes for next token.
*/
 int token(
  char * buffer, 
  int max, 
  char ** next, 
  const char * match)
{
 const char * start = find_first_of(*next, match);
 
     if(start == NULL)
    {
     return -1;
    } 
 
 const char * ptr = find_first_not_of(start, match);
 
     if(ptr == NULL)
    {
     ptr = &start[strlen(start)-1];
    }   
              
 int lcopy = ptr-start;
            
     if(lcopy > max - 1) 
    {
     return lcopy + 1; 
    }

 strncpy(buffer, start, lcopy);
                 
 buffer[lcopy] = 0;                
                 
 *next = (char*)++ptr;
                 
 return 0;
}

The usage for that would be:


Code:
 int main()
{
 const int bufsz = 1024;
 
 char buf[bufsz]; 

 const char * accept = "abcdefghijklmnopq"
                       "rstuvwxyzABCDEFGH"
                       "IJKLMNOPQRSTUVWXYZ";

 char data[] = "This    data...needs   to be  parsed  ";
 
 char * iter = data;
 
     while(0 == token(buf, bufsz, &iter, accept))
    {
     printf("Token: '%s'.\n", buf);
    }

 return 0;
}
Sebastiani is offline   Reply With Quote
Old 03-08-2004, 05:31 PM   #5
Registered User
 
Join Date: Apr 2003
Posts: 17
Woah... thanks... I'm going to have to sit down with a cup of coffee and begin attempting to "translate" all of that in my mind... haha...thanks.
__________________
"None are more hopelessly enslaved than those who falsely believe they are free."
-Goethe

[paraphrased] "Those who would sacrifice essential liberties for a little temporary safety deserve neither."
-Benjamin Franklin
MisterWonderful is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
need sth about parsing Masterx C++ Programming 6 11-07-2008 12:55 AM
added start menu crashes game avgprogamerjoe Game Programming 6 08-29-2007 01:30 PM
draw tree graph of yacc parsing talz13 C Programming 2 07-23-2006 01:33 AM
Need help fixing bugs in data parsing program daluu C Programming 8 03-27-2003 06:02 PM
I hate string parsing with a passion DavidP A Brief History of Cprogramming.com 2 03-19-2002 07:30 PM


All times are GMT -6. The time now is 05:51 AM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.0 RC2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22