Thread: simple question XML parser C

  1. #1
    Registered User
    Join Date
    Jul 2011
    Posts
    12

    simple question XML parser C

    Hi everybody, I'm new with C programming also the expat. So I must realize a project where I should use Expat as a parser XML. The object is parsing this XML file:

    Code:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <note>
        <conf>
            <network>
                <config_interface_loopback>
                    <option>
                        <ifname>lo</ifname>
                        <proto>static</proto>
                        <ipaddr>127.0.0.1</ipaddr>
                        <netmask>255.0.0.0</netmask>
                    </option>
                </config_interface_loopback>
                <config_interface_lan>
                    <option>
                        <ifname>eth0</ifname>
                        <type>bridge</type>
                        <proto>static</proto>
                        <ipaddr>192.168.1.1</ipaddr>
                        <netmask>255.255.255.0</netmask>
                    </option>
                </config_interface_lan>
                <config_interface_wifi>
                    <option>
                        <proto>static</proto>
                        <ipaddr>192.168.2.1</ipaddr>
                        <netmask>255.255.255.0</netmask>
                    </option>
                </config_interface_wifi>
            </network>
            <wireless>
                <config_wifi-device_radio0>
                    <option>
                        <type>atheros</type>
                        <channel>auto</channel>
                        <macaddr>00:15:6d:fc:71:ac</macaddr>
                        <disable>0</disable>
                    </option>
                </config_wifi-device_radio0>
                <config_wifi-iface>
                    <option>
                        <device>radio0</device>
                        <network>wifi</network>
                        <mode>ap</mode>
                        <ssid>OpenWrt</ssid>
                        <encryption>none</encryption>
                    </option>
                </config_wifi-iface>
            </wireless>
        </conf>
    </note>
    I'd like to parse it to a file who names network, and the contents are:
    Code:
    config interface loopback
            option ifname   lo
            option proto    static
            option ipaddr   127.0.0.1
            option netmask  255.0.0.0
    
    config interface lan
            option ifname   eth0
            option type     bridge
            option proto    static
            option ipaddr   192.168.1.1
            option netmask  255.255.255.0
    
    config interface wifi
            option proto    static
            option ipaddr   192.168.2.1
            option netmask  255.255.255.0
    So, I tried myself to use expat. But with the example given by expat, I can't write a parser which could show me the contents of the elements. That's the source code I changed:
    Code:
    #include <stdio.h>
    #include <expat.h>
    
    #if defined(__amigaos__) && defined(__USE_INLINE__)
    #include <proto/expat.h>
    #endif
    
    #ifdef XML_LARGE_SIZE
    #if defined(XML_USE_MSC_EXTENSIONS) && _MSC_VER < 1400
    #define XML_FMT_INT_MOD "I64"
    #else
    #define XML_FMT_INT_MOD "ll"
    #endif
    #else
    #define XML_FMT_INT_MOD "l"
    #endif
    
    #define BUFFSIZE        8192
    
    char Buff[BUFFSIZE];
    
    int Depth;
    
    static void XMLCALL
    start(void *data, const char *el, const char **attr)
    {
      int i;
    
      for (i = 0; i < Depth; i++)
        printf("  ", data);
    
      printf("%s", el);
    
      for (i = 0; attr[i]; i += 2) {
        printf(" %s='%s'", attr[i], attr[i + 1]);
      }
    
      printf("\n");
      Depth++;
    }
    
    static void XMLCALL
    end(void *data, const char *el)
    {
      Depth--;
    }
    
    int
    main(int argc, char *argv[])
    {
      XML_Parser p = XML_ParserCreate(NULL);
      if (! p) {
        fprintf(stderr, "Couldn't allocate memory for parser\n");
        exit(-1);
      }
    
      XML_SetElementHandler(p, start, end);
    
      for (;;) {
        int done;
        int len;
    
        len = (int)fread(Buff, 1, BUFFSIZE, stdin);
        if (ferror(stdin)) {
          fprintf(stderr, "Read error\n");
          exit(-1);
        }
        done = feof(stdin);
    
        if (XML_Parse(p, Buff, len, done) == XML_STATUS_ERROR) {
          fprintf(stderr, "Parse error at line %" XML_FMT_INT_MOD "u:\n%s\n",
                  XML_GetCurrentLineNumber(p),
                  XML_ErrorString(XML_GetErrorCode(p)));
          exit(-1);
        }
    
        if (done)
          break;
      }
      XML_ParserFree(p);
      return 0;
    }
    I'd like to ask the parser to print me all the contents in the file, but it won't work, can somebody help me a little please? Thanks a lot.

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    I'm guessing not a lot of people actually use expat, so if you don't get help here you might want to try their mailing list:

    mail.libexpat.org Mailing Lists

    Presuming that is active, since the development of expat itself is not.

    $0.02: I've written an event-driven HTML parser in C and it is was much easier than I thought it would be; I think the whole thing is about 500 lines/16k of code. There's an undocumented, open source "tiny XML parser" at CCAN that's half that size:

    ccan

    So, if there is a particular reason you need to do this in C (as opposed to perl or python or whatever, for which there will certainly be popular, well maintained and very easy to use modules for XML), you might want to consider writing one yourself, that suits your purposes. If their isn't a particular reason to do it in C, you could probably learn all the perl or python you need in a day...
    Last edited by MK27; 07-12-2011 at 09:35 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    Registered User
    Join Date
    Jul 2011
    Posts
    12
    Thank you for your help. The problems is the os where I work on is Openwrt, a small modular linux system, it takes just 4m, so it's normal that it don't support perl or python either Java... Thank you for your mailing list, i hope there will someone who can also help me.

  4. #4
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    I couldn't find any great documentation on XML/Expat, but this looks sufficient if you bother to read carefully and experiment a bit: XML.com: Using Expat.

    The first thing you need to do is set up a character data handler. This processes the information between the tags (i.e. "127.0.0.1" et al.). Take a look at the XML_SetCharacterDataHandler function.

    The second thing you need to do is give Expat a data structure to fill with the information it extracts from the XML file. I usually make this represent the data as I want to work with it, not as it appears in the XML. For your case, I would probably make a linked list of interfaces, containing a name and a linked list of options:

    Code:
    typedef struct option_s option_t;
    struct option_s {
        char *key;
        char *value;
        option_t *next;
    };
    
    typedef struct interface_s interface_t;
    struct interface_s {
        char *name;
        option_t *opt_list;
        interface_t *next;
    };
    Create a list in your main function, something like interface_t *foo. Then, you need to tie foo to your XML_Parser object using XML_SetUserData(p, foo).

    Now, whenever you hit your start or end element handlers, or you data handler, the first parameter to those funcitons will be a pointer to your list foo. In your start, end and data handlers, you will do things like create a new node for your list, add a new option, add the new node to your interface list, etc.

    Hopefully that gets you going in the right direction. Spend some more time going over the docs and experimenting with some of this. Start with just getting the parser to spit back the XML tags and data in order, then worry about storing the data in your user data structure. Come back if you have more questions and we'll give you a hand.

  5. #5
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Quote Originally Posted by RoxPro View Post
    Thank you for your help. The problems is the os where I work on is Openwrt, a small modular linux system, it takes just 4m, so it's normal that it don't support perl or python either Java... Thank you for your mailing list, i hope there will someone who can also help me.
    Don't need the expat lib if you're familiar with the APIs for regex pattern matching.
    Do a man on regcomp() / regexec() and read the rm_so and rm_eo sections well.

  6. #6
    &TH of undefined behavior Fordy's Avatar
    Join Date
    Aug 2001
    Posts
    5,793
    If you need a quick and easy library to just parse simple xml (no bad formed data, no validating, no writing) then have a look at ezXML (ezXML).
    I've used it a few times for stuff like xml config file reading and found it very easy to use. It can also be added to your source as a c file and header so no library dependencies to worry about either

  7. #7
    Registered User
    Join Date
    Jul 2011
    Posts
    12
    I'm not familar with the APIs for regex pattern matching... And also... I'm new with C too... This is my biggest problem for what I ask for help here...

  8. #8
    Registered User
    Join Date
    Jul 2011
    Posts
    12
    I've done once my job( the parser) with Korn Shell with using sed, I think that was perhaps the same thing with ezXML. But my boss would not validate my job with that method. Because he wants me to do the parser in a way more safe. That means when we change the contents in the XML file, the parser will do the same job to parsing...

  9. #9
    Registered User
    Join Date
    Jul 2011
    Posts
    12
    Thank you for your help, I just read the document of presentation with all functions who was written by Clark in 1999. That helps, but not well. I'll still try to write a parser by myself, if I still have a trouble, I will take an ask.

  10. #10
    Registered User
    Join Date
    Jul 2011
    Posts
    12
    Sorry to diturb you, but I need a hand now...
    I tied to do sth in this 3 weeks. But I still have a little problem.
    First:
    Is expat support parse just a part of tag which one I chose?
    Because I tried with
    XML_SetElementHandler to define the content of which tag I want to parse, I can't...

    The second, I tried take the all the data of the tags with using CharacterDataHandler,
    But I failed too... So, I came to ask a little help, thanks!

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    This is your code, with a few additions
    - a character handler
    - showing how to pass user data around, to record bits of information and state as we go.

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    #include <expat.h>
    
    #if defined(__amigaos__) && defined(__USE_INLINE__)
    #include <proto/expat.h>
    #endif
    
    #ifdef XML_LARGE_SIZE
    #if defined(XML_USE_MSC_EXTENSIONS) && _MSC_VER < 1400
    #define XML_FMT_INT_MOD "I64"
    #else
    #define XML_FMT_INT_MOD "ll"
    #endif
    #else
    #define XML_FMT_INT_MOD "l"
    #endif
    
    #define BUFFSIZE        8192
    
    char Buff[BUFFSIZE];
    
    int Depth;
    
    typedef enum {
      S_NONE,
      S_IFNAME,
      S_PROTO,
      S_IPADDR,
      S_NETMASK,
    } state_et;
    
    typedef struct {
      state_et  state;
      char  interface[100];
      char  ifname[100];
      char  proto[100];
      char  ipaddr[100];
      char  netmask[100];
    } info_st;
    
    static void XMLCALL
    chardata(void *userData, const XML_Char *s, int len)
    {
      info_st   *info = userData;
      switch ( info->state ) {
        case S_IFNAME:
          strncpy(info->ifname,s,len);
          info->ifname[len] = '\0';
          break;
        case S_PROTO:
          strncpy(info->proto,s,len);
          info->proto[len] = '\0';
          break;
        case S_IPADDR:
          strncpy(info->ipaddr,s,len);
          info->ipaddr[len] = '\0';
          break;
        case S_NETMASK:
          strncpy(info->netmask,s,len);
          info->netmask[len] = '\0';
          break;
      }
      info->state = S_NONE;
    //  printf(">-%.*s-<",len,s);
    }
    
    static void XMLCALL
    start(void *data, const char *el, const char **attr)
    {
      info_st   *info = data;
      int i;
    
      {
        char temp[100];
        if ( sscanf( el, "config_interface_%s", temp) == 1 ) {
          strcpy(info->interface,temp);
        }
        if ( strcmp(el,"ifname")==0) info->state = S_IFNAME;
        if ( strcmp(el,"proto")==0) info->state = S_PROTO;
        if ( strcmp(el,"ipaddr")==0) info->state = S_IPADDR;
        if ( strcmp(el,"netmask")==0) info->state = S_NETMASK;
      }
      for (i = 0; i < Depth; i++)
        printf("  ", data);
    
      printf("%s", el);
    
      for (i = 0; attr[i]; i += 2) {
        printf(" %s='%s'", attr[i], attr[i + 1]);
      }
    
      printf("\n");
      Depth++;
    }
    
    static void XMLCALL
    end(void *data, const char *el)
    {
      info_st   *info = data;
      {
        char temp[100];
        if ( sscanf( el, "config_interface_%s", temp) == 1 ) {
          printf("DATA=%s %s %s %s %s\n",
                 info->interface,
                 info->ifname,
                 info->proto,
                 info->ipaddr,
                 info->netmask);
        }
      }
      Depth--;
    }
    
    int
    main(int argc, char *argv[])
    {
      info_st   info = { 0 };
      XML_Parser p = XML_ParserCreate(NULL);
      if (! p) {
        fprintf(stderr, "Couldn't allocate memory for parser\n");
        exit(-1);
      }
    
      XML_SetElementHandler(p, start, end);
      XML_SetCharacterDataHandler(p,chardata);
      XML_SetUserData(p, &info);
    
      for (;;) {
        int done;
        int len;
    
        len = (int)fread(Buff, 1, BUFFSIZE, stdin);
        if (ferror(stdin)) {
          fprintf(stderr, "Read error\n");
          exit(-1);
        }
        done = feof(stdin);
    
        if (XML_Parse(p, Buff, len, done) == XML_STATUS_ERROR) {
          fprintf(stderr, "Parse error at line %" XML_FMT_INT_MOD "u:\n%s\n",
                  XML_GetCurrentLineNumber(p),
                  XML_ErrorString(XML_GetErrorCode(p)));
          exit(-1);
        }
    
        if (done)
          break;
      }
      XML_ParserFree(p);
      return 0;
    }
    The extra information printed looks like this
    DATA=loopback lo static 127.0.0.1 255.0.0.0
    DATA=lan eth0 static 192.168.1.1 255.255.255.0
    DATA=wifi eth0 static 192.168.2.1 255.255.255.0
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User
    Join Date
    Jul 2011
    Posts
    12
    Quote Originally Posted by Salem View Post
    This is your code, with a few additions
    - a character handler
    - showing how to pass user data around, to record bits of information and state as we go.

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    #include <expat.h>
    
    #if defined(__amigaos__) && defined(__USE_INLINE__)
    #include <proto/expat.h>
    #endif
    
    #ifdef XML_LARGE_SIZE
    #if defined(XML_USE_MSC_EXTENSIONS) && _MSC_VER < 1400
    #define XML_FMT_INT_MOD "I64"
    #else
    #define XML_FMT_INT_MOD "ll"
    #endif
    #else
    #define XML_FMT_INT_MOD "l"
    #endif
    
    #define BUFFSIZE        8192
    
    char Buff[BUFFSIZE];
    
    int Depth;
    
    typedef enum {
      S_NONE,
      S_IFNAME,
      S_PROTO,
      S_IPADDR,
      S_NETMASK,
    } state_et;
    
    typedef struct {
      state_et  state;
      char  interface[100];
      char  ifname[100];
      char  proto[100];
      char  ipaddr[100];
      char  netmask[100];
    } info_st;
    
    static void XMLCALL
    chardata(void *userData, const XML_Char *s, int len)
    {
      info_st   *info = userData;
      switch ( info->state ) {
        case S_IFNAME:
          strncpy(info->ifname,s,len);
          info->ifname[len] = '\0';
          break;
        case S_PROTO:
          strncpy(info->proto,s,len);
          info->proto[len] = '\0';
          break;
        case S_IPADDR:
          strncpy(info->ipaddr,s,len);
          info->ipaddr[len] = '\0';
          break;
        case S_NETMASK:
          strncpy(info->netmask,s,len);
          info->netmask[len] = '\0';
          break;
      }
      info->state = S_NONE;
    //  printf(">-%.*s-<",len,s);
    }
    
    static void XMLCALL
    start(void *data, const char *el, const char **attr)
    {
      info_st   *info = data;
      int i;
    
      {
        char temp[100];
        if ( sscanf( el, "config_interface_%s", temp) == 1 ) {
          strcpy(info->interface,temp);
        }
        if ( strcmp(el,"ifname")==0) info->state = S_IFNAME;
        if ( strcmp(el,"proto")==0) info->state = S_PROTO;
        if ( strcmp(el,"ipaddr")==0) info->state = S_IPADDR;
        if ( strcmp(el,"netmask")==0) info->state = S_NETMASK;
      }
      for (i = 0; i < Depth; i++)
        printf("  ", data);
    
      printf("%s", el);
    
      for (i = 0; attr[i]; i += 2) {
        printf(" %s='%s'", attr[i], attr[i + 1]);
      }
    
      printf("\n");
      Depth++;
    }
    
    static void XMLCALL
    end(void *data, const char *el)
    {
      info_st   *info = data;
      {
        char temp[100];
        if ( sscanf( el, "config_interface_%s", temp) == 1 ) {
          printf("DATA=%s %s %s %s %s\n",
                 info->interface,
                 info->ifname,
                 info->proto,
                 info->ipaddr,
                 info->netmask);
        }
      }
      Depth--;
    }
    
    int
    main(int argc, char *argv[])
    {
      info_st   info = { 0 };
      XML_Parser p = XML_ParserCreate(NULL);
      if (! p) {
        fprintf(stderr, "Couldn't allocate memory for parser\n");
        exit(-1);
      }
    
      XML_SetElementHandler(p, start, end);
      XML_SetCharacterDataHandler(p,chardata);
      XML_SetUserData(p, &info);
    
      for (;;) {
        int done;
        int len;
    
        len = (int)fread(Buff, 1, BUFFSIZE, stdin);
        if (ferror(stdin)) {
          fprintf(stderr, "Read error\n");
          exit(-1);
        }
        done = feof(stdin);
    
        if (XML_Parse(p, Buff, len, done) == XML_STATUS_ERROR) {
          fprintf(stderr, "Parse error at line %" XML_FMT_INT_MOD "u:\n%s\n",
                  XML_GetCurrentLineNumber(p),
                  XML_ErrorString(XML_GetErrorCode(p)));
          exit(-1);
        }
    
        if (done)
          break;
      }
      XML_ParserFree(p);
      return 0;
    }
    The extra information printed looks like this
    DATA=loopback lo static 127.0.0.1 255.0.0.0
    DATA=lan eth0 static 192.168.1.1 255.255.255.0
    DATA=wifi eth0 static 192.168.2.1 255.255.255.0
    Thanks for your help first. But I think you may probably not understand me.. Because I want my data printed looks like the model in the first topic. But thanks anyway!!

  13. #13
    Registered User
    Join Date
    Jul 2011
    Posts
    12
    Quote Originally Posted by anduril462 View Post
    I couldn't find any great documentation on XML/Expat, but this looks sufficient if you bother to read carefully and experiment a bit: XML.com: Using Expat.

    The first thing you need to do is set up a character data handler. This processes the information between the tags (i.e. "127.0.0.1" et al.). Take a look at the XML_SetCharacterDataHandler function.

    The second thing you need to do is give Expat a data structure to fill with the information it extracts from the XML file. I usually make this represent the data as I want to work with it, not as it appears in the XML. For your case, I would probably make a linked list of interfaces, containing a name and a linked list of options:

    Code:
    typedef struct option_s option_t;
    struct option_s {
        char *key;
        char *value;
        option_t *next;
    };
    
    typedef struct interface_s interface_t;
    struct interface_s {
        char *name;
        option_t *opt_list;
        interface_t *next;
    };
    Create a list in your main function, something like interface_t *foo. Then, you need to tie foo to your XML_Parser object using XML_SetUserData(p, foo).

    Now, whenever you hit your start or end element handlers, or you data handler, the first parameter to those funcitons will be a pointer to your list foo. In your start, end and data handlers, you will do things like create a new node for your list, add a new option, add the new node to your interface list, etc.

    Hopefully that gets you going in the right direction. Spend some more time going over the docs and experimenting with some of this. Start with just getting the parser to spit back the XML tags and data in order, then worry about storing the data in your user data structure. Come back if you have more questions and we'll give you a hand.
    Sorry to diturb you, but I need a hand now...
    I tied to do sth in this 3 weeks. But I still have a little problem.
    First:
    Is expat support parse just a part of tag which one I chose?
    Because I tried with
    XML_SetElementHandler to define the content of which tag I want to parse, I can't...

    The second, I tried take the all the data of the tags with using CharacterDataHandler,
    But I failed too... So, I came to ask a little help, thanks!

  14. #14
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > Because I want my data printed looks like the model in the first topic. But thanks anyway!!
    I'm well aware of that, which is why I deliberately stopped short of the goal.
    I'm expecting you to pick up the ball and take it to the line.

    The information is extracted, now all you need to do is tart up the printf statements to make it work for you.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  15. #15
    Registered User
    Join Date
    Jul 2011
    Posts
    12
    Quote Originally Posted by Salem View Post
    > Because I want my data printed looks like the model in the first topic. But thanks anyway!!
    I'm well aware of that, which is why I deliberately stopped short of the goal.
    I'm expecting you to pick up the ball and take it to the line.

    The information is extracted, now all you need to do is tart up the printf statements to make it work for you.
    OK, I'll work for that, if I still have questions after, I'll ask you for help! Thanks!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. A simple C parser problem
    By iZephyr in forum C Programming
    Replies: 1
    Last Post: 01-06-2011, 12:01 AM
  2. Simple parser
    By lruc in forum C Programming
    Replies: 5
    Last Post: 11-19-2009, 12:19 AM
  3. Simple Parser Program
    By ChJees in forum C++ Programming
    Replies: 4
    Last Post: 07-19-2007, 03:21 AM
  4. very simple html parser
    By chad101 in forum C++ Programming
    Replies: 1
    Last Post: 07-26-2006, 07:18 PM
  5. question about parser??
    By newbie02 in forum C++ Programming
    Replies: 1
    Last Post: 07-30-2003, 09:17 AM