![]() |
| | #1 |
| Registered User Join Date: Apr 2003
Posts: 17
| So I know what parsing is. But what I'm wondering is, what are the rudimentary basics of parsing and string separation, in terms of programming, and how they work logically? Many thanks, I know that this is a bit vague, I tried to focus my question as well as I could.
__________________ "None are more hopelessly enslaved than those who falsely believe they are free." -Goethe [paraphrased] "Those who would sacrifice essential liberties for a little temporary safety deserve neither." -Benjamin Franklin |
| MisterWonderful is offline | |
| | #2 |
| Registered User Join Date: Jan 2004
Posts: 33
| It depends on what your looking for. If you want to take the string "This is a string. This is another part of this string." And separate it into two strings (one for each sentance) you would iterate until you found a period. If you wanted to parse all words out of a string into multiple strings you could use pointers to iterate until they found a non alpha-numerical character, take the difference between the two pointers, and copy the data into a new string. There are multiple ways of parsing strings, and all of them rely on some form of iteration.
__________________ “Focused, hard work is the real key to success. Keep your eyes on the goal, and just keep taking the next step towards completing it. " -John Carmack |
| /Muad'Dib\ is offline | |
| | #3 |
| Registered User Join Date: Mar 2002
Posts: 1,595
| In my mind parsing involves taking a given original string and looking for a given set of targets--which may be substrings or single characters. The full string is either physically or logically subdivided into resultant substrings (aka tokens) using some protocol if a target is found. |
| elad is offline | |
| | #4 |
| Guest Join Date: Aug 2001
Posts: 4,923
| Here's an example of two very rudimentary parsing implementations to illustrate the basic concept. I hope it helps. ![]() Both use a pointer to a pointer to a buffer (ie: a char**) to keep track of the current position. The first example simply reads in tokens that are separated by one or more of a certain character (usually a space). Code:
/*
return values:
0 : success
-1 : done
> 0 : buffer too small, need this
many bytes for next token.
*/
int token(
char * buffer,
int max,
char ** next,
char sep)
{
const char * ptr, * start = NULL;
for(ptr = *next; *ptr; ++ptr)
{
if(*ptr != sep)
{
if(start == NULL)
{
start = ptr; // first char in token
}
}
else
{
if(start != NULL)
{
break; // ready to copy
}
}
}
if(start == NULL)
{
return -1; // done
}
int lcopy = ptr-start;
if(lcopy > max - 1) // max - 1 for null-term
{
return lcopy + 1; // need a buffer this big
}
strncpy(buffer, start, lcopy);
buffer[lcopy] = 0;
*next = (char*)ptr;
return 0;
}
It's usage would be like this: Code:
int main()
{
const int bufsz = 1024;
char buf[bufsz];
char data[] = "This data...needs to be parsed ";
char * iter = data;
while(0 == token(buf, bufsz, &iter, ' '))
{
printf("Token: '%s'.\n", buf);
}
return 0;
}
The next one takes a slightly different approach by skipping over anything that doesn't match a certain 'token-type' string. It uses two helper functions, find_first_of and find_first_not_of in order to accomplish that. Code:
const char * find_first_of(
const char * str,
const char * find)
{
const char * ptr;
for( ; *str; ++str)
{
for(ptr = find; *ptr; ++ptr)
{
if(*str == *ptr)
{
return str;
}
}
}
return NULL;
}
const char * find_first_not_of(
const char * str,
const char * find)
{
const char * ptr = find;
bool found;
for( ; *str; ++str)
{
found = false;
for(ptr = find; *ptr; ++ptr)
{
if(*str == *ptr)
{
found = true;
}
}
if(!found)
{
return str;
}
}
return NULL;
}
/*
return values:
0 : success
-1 : done
> 0 : buffer too small, need this
many bytes for next token.
*/
int token(
char * buffer,
int max,
char ** next,
const char * match)
{
const char * start = find_first_of(*next, match);
if(start == NULL)
{
return -1;
}
const char * ptr = find_first_not_of(start, match);
if(ptr == NULL)
{
ptr = &start[strlen(start)-1];
}
int lcopy = ptr-start;
if(lcopy > max - 1)
{
return lcopy + 1;
}
strncpy(buffer, start, lcopy);
buffer[lcopy] = 0;
*next = (char*)++ptr;
return 0;
}
The usage for that would be: Code:
int main()
{
const int bufsz = 1024;
char buf[bufsz];
const char * accept = "abcdefghijklmnopq"
"rstuvwxyzABCDEFGH"
"IJKLMNOPQRSTUVWXYZ";
char data[] = "This data...needs to be parsed ";
char * iter = data;
while(0 == token(buf, bufsz, &iter, accept))
{
printf("Token: '%s'.\n", buf);
}
return 0;
}
|
| Sebastiani is offline | |
| | #5 |
| Registered User Join Date: Apr 2003
Posts: 17
| Woah... thanks... I'm going to have to sit down with a cup of coffee and begin attempting to "translate" all of that in my mind... haha...thanks.
__________________ "None are more hopelessly enslaved than those who falsely believe they are free." -Goethe [paraphrased] "Those who would sacrifice essential liberties for a little temporary safety deserve neither." -Benjamin Franklin |
| MisterWonderful is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| need sth about parsing | Masterx | C++ Programming | 6 | 11-07-2008 12:55 AM |
| added start menu crashes game | avgprogamerjoe | Game Programming | 6 | 08-29-2007 01:30 PM |
| draw tree graph of yacc parsing | talz13 | C Programming | 2 | 07-23-2006 01:33 AM |
| Need help fixing bugs in data parsing program | daluu | C Programming | 8 | 03-27-2003 06:02 PM |
| I hate string parsing with a passion | DavidP | A Brief History of Cprogramming.com | 2 | 03-19-2002 07:30 PM |