Thread: Extracting data from a string..

  1. #1
    C/C++ Learner & Lover
    Join Date
    Aug 2008
    Location
    Argentina
    Posts
    193

    Extracting data from a string..

    Well.. I have a doubt on my code.. how can I extract content from a string like this "type=1;ammount=1212;name=motorbike;"
    I want to extract the content of type, ammount and name for example so I can use it later.. any examples or ideas? examples will be appreciated =)

  2. #2
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Yep. Several ideas. You could use strtok() (which will be immediately followed by Elysia's pushing of her strtok()--which is good. Use either hers or the standard c one. I don't care) and a subsequent call to either sscanf() or strchr() then sscanf().

  3. #3
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Divide and conquer.
    Code:
    Find next semicolon... 
        Find next equal sign... 
            Left of equal sign is key 
            Right of equal sign is value 
        Repeat until no more semicolons.
    Mainframe assembler programmer by trade. C coder when I can.

  4. #4
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    If you want a specific example (though granted in another programming language) just google "parsing cookies."

  5. #5
    C/C++ Learner & Lover
    Join Date
    Aug 2008
    Location
    Argentina
    Posts
    193
    I've read in the forum this..
    Code:
    char *str; // <- Why is it a pointer?
    char line[]="somefile:anotherfile";
    while (str != NULL) {
      str = strtok(line, ":"); // <- this will store "somefile"
      str = strtok(NULL, ":"); // <- now str will contain "anotherfile"
    }
    What if I need 3 things to parse, do I have to add antoher NULL in the first parameter of strtok?
    And I have 2 parameters the start and the end.. what do you suggest me to do?
    Last edited by lautarox; 09-23-2008 at 12:00 PM.

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    The example you are using is NOT a good one!

    Better:
    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main() {
            char string[]="type=1;ammount=1212;name=motorbike;", *tok;
            tok=strtok(string,";");                
            puts(tok);
            while((tok=strtok(NULL,";")) != NULL) puts(tok);
    }
    Notice that you only pass "string" as the first argument to strtok the first time.

    Quote Originally Posted by lautarox View Post
    What if I need 3 things to parse, do I have to add antoher NULL in the first parameter of strtok?
    And I have 2 parameters the start and the end.. what do you suggest me to do?
    Only the first one should use the actual char variable. The loop in your example will not work because it reinitializes strtok at every pass.

    My output looks like this:

    type=1
    ammount=1212
    name=motorbike
    Last edited by MK27; 09-23-2008 at 12:07 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #7
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    If it were me doing this sort of program, and believe me I have had to parse many a string like that in the past.

    Example:
    Code:
    enum E_VARIABLE_TYPE =
    {
      EVT_UNK,
      EVT_INT,
      EVT_STR,
      EVT_FLT
    };
    
    struct variable
    {
      enum E_VARIABLE_TYPE type;
      const char *name;
      union
      {
        const char *str_val;
        int int_val;
        float flt_val;
      }; /* This isn't standard as far as I am aware. But it may work for you.
          * If not, just resolve the anonymous union error your compiler throws up.
          */
    };
    Then parse out the semicolons to split appart each "variable" then work your magic to determine the nature of the variable. You could also make the variable structure a linked list or something too. But its just a suggestion. I don't know how static your string is going to be. That kind of makes a difference.
    Last edited by master5001; 09-23-2008 at 12:15 PM.

  8. #8
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by lautarox View Post
    I've read in the forum this..
    Code:
    char *str; // <- Why is it a pointer?
    char line[]="somefile:anotherfile";
    while (str != NULL) {
      str = strtok(line, ":"); // <- this will store "somefile"
      str = strtok(NULL, ":"); // <- now str will contain "anotherfile"
    }
    What if I need 3 things to parse, do I have to add antoher NULL in the first parameter of strtok?
    And I have 2 parameters the start and the end.. what do you suggest me to do?
    NULL means "use the same string you used the last time, oh, and start from where the last one left off". So the first time you call strtok, you pass the string along; after that, you pass in NULL so that the altered copy is used henceforward.

  9. #9
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Ok well since it merits being said given your example of strtok() that you found. This is a revised version of strtok() by the lovely Elysia. You may wish to use hers instead since it won't damage the original input data. Though in honesty, when I have done things like this in my projects, changing the original data was not an issue at all.

  10. #10
    C/C++ Learner & Lover
    Join Date
    Aug 2008
    Location
    Argentina
    Posts
    193
    Why this doesn't imput what's between "dude=" and ";"
    Code:
    char lol[]="dude=matt;car=toyota;", *p;
    p=strtok(lol, "dude=");
    p=strtok(NULL, ";");
    puts(p);

  11. #11
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    strtok doesn't use multicharacter tokenisers. When you use "dude=" as the second argument, strtok will look for one of 'd', 'u', 'd', 'e', or '=' to stop the first strtok. And of course it finds it right away, so it only throws away "d", not "dude=".

    You probably want to use "=" as the first tokeniser and ";" as the second.

  12. #12
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Because you are using strtok() incorrectly....

    bleh... I hate writing out examples for people like this...

    Example:
    Code:
    #include <ctype.h>
    #include <stdlib.h>
    #include <stdio.h>
    
    /* First let me copy and paste my data structure from an earlier post */
    enum E_VARIABLE_TYPE =
    {
      EVT_UNK,
      EVT_INT,
      EVT_STR,
      EVT_FLT
    };
    
    struct variable
    {
      enum E_VARIABLE_TYPE type;
      const char *name;
      union
      {
        const char *str_val;
        int int_val;
        float flt_val;
      }; /* This isn't standard as far as I am aware. But it may work for you.
          * If not, just resolve the anonymous union error your compiler throws up.
          */
    
      /* Let me make this a linked list for simplicity. */
      struct variable *next;
    };
    
    struct variable *parse(char *buffer)
    {
      struct variable *head = malloc(sizeof(*head)), *node = head;
    
      if(!head)
       return 0;
    
      for(buffer = strtok(buffer, ";"); buffer; strtok(NULL, "';"))
      {
        if(!node)
        {
          fputs("Cannot complete the parsing! Out of memory!\n", stderr);
          return head;
        }
    
        node->name = buffer;
        node->next = malloc(sizeof(*node));
        node = node->next;
      }
    
      if(node)
        node->next = 0;
    
      if(head)
        /* Now parse each segment of data */
        for(node = head; node; node = node->next)
        {
           char *data = strchr(node->name, '=');
    
           if(data)
           {
               *data++ = 0;
               node->str_val = data;
               node->type = EVT_INT;
    
               /* Now the type needs to be determined */
               for(;*data;++data)
               {
                   if(isalpha(*data))
                   {
                       node->type = EVT_STR;
                       break;
                   }
                   if(ispunct(*data))
                   {
                       node->type = EVT_FLT;
                   }
               }
    
               switch(node->type)
               {
                   case EVT_INT:
                       node->int_val = strtol(node->str_val, NULL, "10");
                       break;
                   case EVT_FLT:
                       node->flt_val = atof(node->str_val);
                       break;
               }
           }
        }
    
      return head;
    }
    
    //[EDIT]
    /* For the hell of it... */
    void revert(char *buffer, size_t len, struct variable *node)
    {
      size_t offset;
    
      for(;node; node = node->next)
      {
         switch(node->type)
         {
            case EVT_INT:
               offset = snprintf(buffer, len, "&#37;s=%d;", node->name, node->int_val);
               break;
            case EVT_FLT:
               offset = snprintf(buffer, len, "%s=%f;", node->name, node->flt_val);
               break;
            case EVT_STR:
               offset = snprintf(buffer, len, "%s=%s;", node->name, node->str_val);
               break;
         }
    
         buffer += offset;
         len -= offset;
      }
    }
    
    //[/EDIT]
    There are many imperfections in this code... It doesn't handle hex values (which it easily could). It doesn't necessarily handle negative ints correctly, which again, its an easy fix. But its a start.
    Last edited by master5001; 09-23-2008 at 01:26 PM.

  13. #13
    C/C++ Learner & Lover
    Join Date
    Aug 2008
    Location
    Argentina
    Posts
    193
    Imagine if I get 12233 from the extraction, how can i convert it into a integer?
    Sorry, I didn't saw you post
    Last edited by lautarox; 09-23-2008 at 01:26 PM.

  14. #14
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    There are such things as atoi. (Note that the master's sample code used strtol also.)

  15. #15
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    The main reason I used strtol is to make it easier for the OP to identify where his own code can be expanded to take in hexidecimal and octal inputs with just a little extra work.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  2. Inheritance Hierarchy for a Package class
    By twickre in forum C++ Programming
    Replies: 7
    Last Post: 12-08-2007, 04:13 PM
  3. Errors
    By Rhidian in forum C Programming
    Replies: 10
    Last Post: 04-04-2005, 12:22 PM
  4. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM
  5. gcc problem
    By bjdea1 in forum Linux Programming
    Replies: 13
    Last Post: 04-29-2002, 06:51 PM