Thread: Parsing XML with C without 3rd party libs

  1. #1
    Registered User
    Join Date
    Dec 2008
    Posts
    5

    Parsing XML with C without 3rd party libs

    I am having real problems parsing an XML file. Can anybody point me in the right direction for an algorithm to enable me to do this?

    Whilst I am not new to programming it has been over 12 years since I did any C and even then it was never particularly anything significant, so please treat me as a complete newbie, so it has to be easy to understand.

    I have trawled the internet for a couple days trying to find a solution to this but haven't come up with anything apart from 3rd party stuff.

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Hey Mike,

    Please post a little of your XML file, and your C program that can't parse it correctly. I don't know of a link to what you want, but I know we have some sharp people on the forum.

    You show them a problem with a C program, and they're just very likely to show you how to fix that problem.

    And Welcome to the C forum.

  3. #3
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    There are a few ansi XML parser libraries, what's wrong with using them? ie lib mini-xml (libmxml).

    It's a lot of work to tap out a standards compliant XML parser, even then you'd probably just be introducing new bugs.

    What I'd do, is whip up a stack, and push each tag onto the stack (parse it), then process it that way. For example:

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <tests>
    <test name="a">Is the sun actually hot?</test>
    <test name="b" />
    </tests>
    Then,
    Code:
    PUSH <?xml version="1.0" encoding="UTF-8"?>
    PUSH "tests" -- OPEN
    PUSH "test" -- OPEN, ATTRIBUTE name = "a"
    PUSH "Is the sun actually hot?" -- CONTENT
    PUSH "test" -- CLOSE
    PUSH "test" -- OPEN, ATTRIBUTE name = "b"
    PUSH "test" -- CLOSE
    PUSH "tests" -- CLOSE
    And that way you can process fairly malformed XML.

    You could push structures or something
    Code:
    struct stack_element
    {
       enum type t;
       union content_u
       {
          struct tag open;
          struct content cont;
       } content;
    }
    The same concept can be done with an array and a "z-index", but why would you when you can use a stack!? :-)
    Last edited by zacs7; 12-18-2008 at 05:15 AM.

  4. #4
    Registered User
    Join Date
    Dec 2008
    Posts
    5
    Hi,

    The following is the beginning of the file to the end of the first entry of the xml file.

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <kml xmlns="http://www.opengis.net/kml/2.2">
    <Document>
    <Style id="encrypted"><IconStyle><Icon><href>http://maps.google.com/mapfiles/kml/pushpin/red-pushpin.png</href></Icon></IconStyle></Style>
    <Style id="open"><IconStyle><Icon><href>http://maps.google.com/mapfiles/kml/pushpin/grn-pushpin.png</href></Icon></IconStyle></Style>
    <Folder>
    <Placemark>
     <name>Avalon</name>
    <styleUrl>#encrypted</styleUrl>
    <description>
    <![CDATA[
    <p><b>BSSID:</b> 00:1e:5a:6e:9d:25
    <p><b>Caps:</b> [WPA-PSK-TKIP]
    <p><b>Freq:</b> 2.437000
    <p><b>Signal:</b> -89db
    <p><b>Speed (mph):</b> 22.370000
    <p><b>Bearing:</b> 284.609375
    <p><b>Last Seen:</b> dd mmm yyyy hh:mm:ss
    ]]>
    </description>
     <Point>
      <coordinates>-2.558100, 52.05510</coordinates>
     </Point>
    </Placemark>
    The code I have written:

    opens the file;
    creates a buffer the size of the file;
    Reads the data into buffer ready for parsing.

    Currently that is the extent of my progress.

  5. #5
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Seems like a waste of memory to read the whole file in only to unload it moments later.

    I'd suggest,
    * create a buffer of size 1K or similar.
    * fill the buffer with stuff from the file
    * parse the buffer adding the stuff to the stack, reading more stuff into the buffer if and when you need it.

  6. #6
    Registered User
    Join Date
    Dec 2008
    Posts
    5

    waste of memory

    You're right.

    I have been thinking the same thing, but I wanted to get the parsing of the XML out the way before I tackled that.

    I am doing this as an exercise in re-learning C programming. I just wish that C had inbuilt functions for regular expressions. It would have made this so much easier.

    Nevermind though. If we don't set ourselves tough challanges we won't progress.

  7. #7
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Mike, understand I'm just talking out my butt here, but anyway.

    It would be very easy from the looks of that file, to strip away all the XML stuff inside the <> brackets, and of course, remove the angled brackets, themselves.

    Problem with that is you'll be left with a page of text, which will be totally without formatting or other XML tag info. So I'm thinking some of those tags should be read and processed, so your remaining data can be arranged properly on the output page.

    Can you look at the XML tags and pick out what you want the program to do when it encounters some (or all) of these tags?

    Like any tag with a string inside, should be newlined and left justified, and the string should be printed, followed by a colon: ie:

    Coordinates:

    If you could make up a list of those, and the action that should be taken (and the simpler the better), that seems like the next step.

    Did you like Zac's idea of using a stack? I thought a FIFO buffer idea would be the trick here to use, but I'm not familiar with this XML stuff.

  8. #8
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Come to think of it, it does make more sense to use a queue :-).

  9. #9
    Registered User
    Join Date
    Dec 2008
    Posts
    5

    The Plan...

    The plan is:

    Not a scooby doo:
    Parse the data from multiple files that are entered onto the command line.

    Easy:
    Store the data in a data structure write this as a binary file to disk.

    Easy:
    Generate a HTML file from the file that will display all the items on a map (google).


    Opening the files and reading them into a buffer ready for parsing is done. When I have managed to parse this buffer into the data structure I will go about optimising the design.

    Well that is the plan... :-S

  10. #10
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    You have now sailed far beyond my navigational charts.

    Good luck, Mike!

  11. #11
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Mike, are you wanting to parse any xml or just valid xml?

    To me it seems a linked chain of structures would work well for parsing xml. Something like this:

    Code:
    //
    //   NOT SYNTAX CHECKED!!  
    //
    enum FLAVOR { CDATA, PCDATA, etc.. } ;
    
    struct xmltag { 
    	char * tagname ;   // points to a giname struct
    	char * attrlist ;  // points to an attrs array 
    	int attr_count ;   // count of attributes 
    	char * data ;      // points to the element value 
    	int  data_length ;  // length of above value 
    	xmltag * parent  ;  // pointer to parent structure 
    	xmltag * child ;    // pointer to the first child (nested) element structure 
    	xmltag * next_sibling ; // pointer to the next sibling structure 
    	enum data_flavor ;    // some value of FLAVOR 
    } ; 
    
    struct giname { 
    	char name[100] ; 
    } 
    
    struct attrs { 
    	char * attrname ;   // points to a giname struct 
    	char * value ;      // pointer to the value of the attribute 
    }
    However, if you also want to do validation and support namespaces, then that's yet another huge consideration, as you'll have to read and parse the dtd for each of the namespaces as well.

    And, of course, you have to be able to handle the myriad of encodings that are available too.
    Mainframe assembler programmer by trade. C coder when I can.

  12. #12
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Quote Originally Posted by mike_morley View Post
    The plan is:

    ...

    Easy:
    Generate a HTML file from the file that will display all the items on a map (google).

    ...

    Well that is the plan... :-S
    It seems to me that
    1) You already have an XML file (a Google kml file)
    2) You want to display some of that information - ie, you aren't changing anything, just changing the presentation of the data (for the worse, actually).

    Therefore, why read and parse it? Why not just throw your own custom XSLT style sheet at it and just display the data you want? I mean, that's pretty much the whole entire point of XSLT.

    Or, you could install DB2 Express-C for LUW (Linux, Unix and Windows), which is free, (it's a relational database from IBM) ), and store the kml document as a pureXML column in table, and then you could use SQL with XML functions and XPath qualifications to just get the data you needed, in exactly the output you want, with your own custom XSLT style sheet. I think that's what I would do.
    Mainframe assembler programmer by trade. C coder when I can.

  13. #13
    Registered User
    Join Date
    Dec 2008
    Posts
    5
    Thanks for all the replies.

    I know I could display the file by throwing a XSLT Style Sheet at it... If I was doing this for a project that had a deadline that is how I would do it.

    I am doing this an an exersice in re-learning how to program in C. The point of this exersice is the manipulation of data / creating data structures to hold the data.

    Sometimes doing things the easy way isn't as much fun

  14. #14
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    And sometimes you run across a Christopher Columbus.

    Happy sailing, Mike!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Parsing Xml
    By deviousdexter in forum C# Programming
    Replies: 7
    Last Post: 04-24-2009, 06:29 AM
  2. xml file parsing in C
    By lonbgeach in forum Tech Board
    Replies: 31
    Last Post: 12-14-2006, 02:14 AM
  3. 3rd party libraries instead of OGL & DX
    By VirtualAce in forum Game Programming
    Replies: 12
    Last Post: 09-17-2006, 01:58 PM
  4. XML Parsing
    By Mareq in forum C++ Programming
    Replies: 8
    Last Post: 11-09-2005, 09:20 PM
  5. XML File Parsing
    By WebSnozz in forum C++ Programming
    Replies: 0
    Last Post: 04-21-2002, 03:24 AM

Tags for this Thread