Thread: Dissecting an Excel XML spreadsheet

  1. #1
    Registered User
    Join Date
    Dec 2007
    Posts
    30

    Dissecting an Excel XML spreadsheet

    I have a worksheet saved in XML Spreadsheet 2003 file format.

    I’m supposed to read a specific value from it, given it’s cell address, such as „A5“ or „B3“. As I have almost-absolutely no previous knowlege of XML, I though to ask you advice how to parse through this thing (most easily) without using a thirdparty XML parser or building one myself (which I couldn’t do anyway), before I start cracking.
    Looking through the XML file (shown at the end of the post), I see that I can easily ignore most of it. The first thing that I care about seems to be the <Worksheet ss:Name=“Sheet1“> tag. The next object of interest seems to be:


    Code:
    <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="2" x:FullColumns="1"
       x:FullRows="1" ss:DefaultRowHeight="15">
    Which tells me how large the spreadsheet is (ExpanedColumnCount and ExpandedRowCount), which in the current example is 2, 2 (which is B2 in Excel’s language).
    From there on comes the listing of rows and cells, like this:
    Code:
       <Row ss:AutoFitHeight="0"/>
       <Row ss:AutoFitHeight="0">
        <Cell ss:Index="2"><Data ss:Type="String">THISCELL!</Data></Cell>
       </Row>
    With „B2“ given as the address, I see that there’s no data on the first row. In the second row, however, is one cell value. What’s important with CELL tag’s is that it has an Index argument. Given in the example above, the B2 cell value („THISCELL!“) has no cells before it. So reading the index is crucial to count which cell value to read (or write).
    So, knowing all this, I thought of to solve this problem like this:
    1) Convert given Excel address to normal value ((ie „B2“ -> „2, 2“)
    2) Search for the starting tag of the worksheet.
    3) Count how many <Row> tags I have.
    4) Loop through the rows until given y coordinate (from address). Also stop if reach the end of the worksheet.
    5) Check if there are cells in the row, then read how many cells there are and what’s the first cell’s index is (if present).
    6) Loop through all cells until given x coordinate (subtracting first cell’s index value from it first).
    7) Read the value.
    What do you think ? Is it the optimal way to do it (quick and simple, using C’s strtok_r or something of C++’s equivalent) ?
    The main question ofcourse is how to parse the tags (to read their arguments and stuff). I’m worried that the same tag can be given in two ways:

    <Row/> or.. <Row>data here</Row>

    Any advice how to proceed with that ? An example or article or just some pointers how to start parsing those tags (reading their arguments and open and closing braces and so on and on and on..).

    Best wishes, D5

    P.S. The whole XML file (with one value at B2 location) used also in the example above:

    Code:
    <?xml version="1.0"?>
    <?mso-application progid="Excel.Sheet"?>
    <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
     xmlns:o="urn:schemas-microsoft-com:office:office"
     xmlns:x="urn:schemas-microsoft-com:office:excel"
     xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
     xmlns:html="http://www.w3.org/TR/REC-html40">
     <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
      <Author>Andrew</Author>
      <LastAuthor>Andrew</LastAuthor>
      <Created>2008-05-22T18:24:54Z</Created>
      <Version>12.00</Version>
     </DocumentProperties>
     <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
      <WindowHeight>11985</WindowHeight>
      <WindowWidth>24855</WindowWidth>
      <WindowTopX>240</WindowTopX>
      <WindowTopY>120</WindowTopY>
      <ProtectStructure>False</ProtectStructure>
      <ProtectWindows>False</ProtectWindows>
     </ExcelWorkbook>
     <Styles>
      <Style ss:ID="Default" ss:Name="Normal">
       <Alignment ss:Vertical="Bottom"/>
       <Borders/>
       <Font ss:FontName="Calibri" x:CharSet="186" x:Family="Swiss" ss:Size="11"
        ss:Color="#000000"/>
       <Interior/>
       <NumberFormat/>
       <Protection/>
      </Style>
     </Styles>
     <Worksheet ss:Name="Sheet1">
      <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="2" x:FullColumns="1"
       x:FullRows="1" ss:DefaultRowHeight="15">
       <Row ss:AutoFitHeight="0"/>
       <Row ss:AutoFitHeight="0">
        <Cell ss:Index="2"><Data ss:Type="String">THISCELL!</Data></Cell>
       </Row>
      </Table>
      <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
       <PageSetup>
        <Header x:Margin="0.3"/>
        <Footer x:Margin="0.3"/>
        <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
       </PageSetup>
       <Unsynced/>
       <Selected/>
       <Panes>
        <Pane>
         <Number>3</Number>
         <ActiveRow>1</ActiveRow>
         <ActiveCol>1</ActiveCol>
        </Pane>
       </Panes>
       <ProtectObjects>False</ProtectObjects>
       <ProtectScenarios>False</ProtectScenarios>
      </WorksheetOptions>
     </Worksheet>
     <Worksheet ss:Name="Sheet2">
      <Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1" x:FullColumns="1"
       x:FullRows="1" ss:DefaultRowHeight="15">
       <Row ss:AutoFitHeight="0"/>
      </Table>
      <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
       <PageSetup>
        <Header x:Margin="0.3"/>
        <Footer x:Margin="0.3"/>
        <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
       </PageSetup>
       <Unsynced/>
       <ProtectObjects>False</ProtectObjects>
       <ProtectScenarios>False</ProtectScenarios>
      </WorksheetOptions>
     </Worksheet>
     <Worksheet ss:Name="Sheet3">
      <Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1" x:FullColumns="1"
       x:FullRows="1" ss:DefaultRowHeight="15">
       <Row ss:AutoFitHeight="0"/>
      </Table>
      <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
       <PageSetup>
        <Header x:Margin="0.3"/>
        <Footer x:Margin="0.3"/>
        <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
       </PageSetup>
       <Unsynced/>
       <ProtectObjects>False</ProtectObjects>
       <ProtectScenarios>False</ProtectScenarios>
      </WorksheetOptions>
     </Worksheet>
    </Workbook>

  2. #2
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    The problem is, your program may not work with all valid XML spreadsheets if you are making assumptions about the format (other than that is required by the XML standard). A cleaner solution would be to use a third party XML library (why not?).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Reading/Writing an Excel Spreadsheet?
    By Blackroot in forum Windows Programming
    Replies: 3
    Last Post: 10-15-2007, 09:59 PM
  2. Replies: 2
    Last Post: 11-30-2006, 08:04 AM
  3. Replies: 1
    Last Post: 05-05-2006, 11:17 PM
  4. Exporting to spreadsheet (excel or other)
    By jpiepgrass in forum C Programming
    Replies: 3
    Last Post: 07-30-2004, 05:44 PM
  5. Is it possible to write into an Excel spreadsheet?
    By rtsc17 in forum C++ Programming
    Replies: 6
    Last Post: 09-18-2003, 10:15 AM