Thread: Parsing XML file into C structs

  1. #1
    Registered User
    Join Date
    Apr 2009
    Posts
    2

    Parsing XML file into C structs

    I have a massive XML file that I need to parse into C.

    Code:
      <node id="4700-Z$0" x="322" y="-505" z="0.0" /> 
      <node id="4701" x="356" y="-454" z="0.0" /> 
      <node id="4702-H" x="402" y="-456" z="0.0" /> 
      <node id="4704" x="400" y="-428" z="0.0" /> 
      <node id="4705" x="455" y="-426" z="0.0" />
    thats just a small sample code of the XML. So i want to extract the Node, x,y,z.

    Into something like..

    Code:
    typedef struct{
                  char node[10];
                   int x; int y; int z;
    } graphJ;
    I tried looking at the libxml2 but can't get it to build, and I don't think I need it for something somewhat simple for this... I think?

  2. #2
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Sounds like mini-xml is what you need Mini-XML

    I've use it before, it's great.

  3. #3
    Registered User
    Join Date
    Apr 2009
    Posts
    2
    this would be great except im using visual studios

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    It doesn't say anywhere that MS Visual Studio DOESN'T work - it says that it WORKS with gcc. I'd give it a go.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    Quick and dirty...
    Code:
    	char *tag_id = "id=\"";
    	char *tag_x = "x=\"";
    	char *tag_y = "y=\"";
    	char *tag_z = "z=\"";
    
    	typedef struct{
                  char node[10];
                   int x; int y; int z;
    	} graphJ_def;
    
    	graphJ_def graphJ[10];
    	int n = 0;
    	char *p;
    
    	// for every line input...
    
    	sscanf(strstr(tests, tag_id) + strlen(tag_id), "%s", graphJ[n].node);
    	graphJ[n].node[strlen(graphJ[n].node) - 1] = '\0';	// kill traling quote
    
    	sscanf(strstr(tests, tag_x) + strlen(tag_x), "%d", &graphJ[n].x);
    
    	sscanf(strstr(tests, tag_y) + strlen(tag_y), "%d", &graphJ[n].y);
    
    	sscanf(strstr(tests, tag_z) + strlen(tag_z), "%d", &graphJ[n].z);
    
    	n++;

  6. #6
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    Quote Originally Posted by nonoob View Post
    Quick and dirty...
    Yeah, but it's got so much potential for disaster. If any of those strstr()s fail, if there's some oddity like an entity, the XML isn't ascii, other tags, the id attribute contains an x, y or z, etc. (What's the char *p for?)
    And z appears to be a float/double, btw.
    But, like you said, quick and dirty.

    I'd take the time to learn an XML library. If you've not compiled many libraries before, this is a good opportunity. (I've compiled libxml2 on Linux & Windows, granted, I used gcc both times... perhaps they have a binary download?) You'll run into full blown XML parsing sooner or later.
    You could XSLT it into an easier format, perhaps too. (I'm learning XSLT at the moment... so it's at the top of my head...)
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  7. #7
    Registered User Tonto's Avatar
    Join Date
    Jun 2005
    Location
    New York
    Posts
    1,465
    I like the xerces-c library for xml

  8. #8
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    If all of your data follows such a strict layout -- i.e., one tag per line etc -- then you could use some scripting to convert the data into a more easily parseable form. Just a thought. I wouldn't do this if your program will be run often in the future because it's not very robust, but a simple script like this could make your life a lot easier in C:
    Code:
    $ cat input
      <node id="4700-Z$0" x="322" y="-505" z="0.0" />
      <node id="4701" x="356" y="-454" z="0.0" />
      <node id="4702-H" x="402" y="-456" z="0.0" />
      <node id="4704" x="400" y="-428" z="0.0" />
      <node id="4705" x="455" y="-426" z="0.0" />
    $ perl -ne 'while(/\s(?:id|[xyz])="([^"]+)"/g) { print "$1 " } print "\n"' input
    4700-Z$0 322 -505 0.0
    4701 356 -454 0.0
    4702-H 402 -456 0.0
    4704 400 -428 0.0
    4705 455 -426 0.0
    $
    That's in the spirit of even more quick and dirty solutions. If this program is actually going to be used more than once, for sure look into libmxml. I've used it before too (thanks to zacs7!) and would highly recommend it. libexpat is good too.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Can we have vector of vector?
    By ketu1 in forum C++ Programming
    Replies: 24
    Last Post: 01-03-2008, 05:02 AM
  2. gcc link external library
    By spank in forum C Programming
    Replies: 6
    Last Post: 08-08-2007, 03:44 PM
  3. Basic text file encoder
    By Abda92 in forum C Programming
    Replies: 15
    Last Post: 05-22-2007, 01:19 PM
  4. Game Pointer Trouble?
    By Drahcir in forum C Programming
    Replies: 8
    Last Post: 02-04-2006, 02:53 AM
  5. Dikumud
    By maxorator in forum C++ Programming
    Replies: 1
    Last Post: 10-01-2005, 06:39 AM