Thread: Search through HTML

  1. #1
    Registered User
    Join Date
    Nov 2006
    Posts
    19

    Search through HTML

    I have a large file of HTML type format. I've included a sample of it below. I'm trying to find the contents of whats between a specific tag. In this case, I'm trying to find and store the contents of whats between <QUAL> and </QUAL>. I want to do this as many times as <QUAL> tags appear. Can someone steer me in the right direction of what kind of code I have to write to do this.

    <TUV xml:lang='en-US'>Interfaces</TUV>
    </NAME>
    <DESC>
    <TUV xml:lang='en-US'>(System category)</TUV>
    </DESC>
    <QUAL>COM.INTERFACES</QUAL>
    </ATTRCAT>
    <ATTRCAT id='_0x01804c20' usage='sys'>
    <NAME>
    <TUV xml:lang='en-US'>Initialization</TUV>
    </NAME>
    <DESC>

    Thanks

  2. #2
    {Jaxom,Imriel,Liam}'s Dad Kennedy's Avatar
    Join Date
    Aug 2006
    Location
    Alabama
    Posts
    1,065
    Scan through the stream looking for your tag. When you find your tag, read through the end of the tag into a large array (one that would be more than large enough to hold the string). Terminate the array with a '\0'. There is your string!

  3. #3
    Registered User
    Join Date
    Nov 2006
    Posts
    19

    Follow up

    Thanks for your help. What code do I write to scan through the stream looking for the tag? I'm a mediocre programmer so I'm not sure what I would write to do that

  4. #4
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    for example
    Code:
    get size of the file
    size ++;
    unsigned char* buffer = malloc(size * sizeof *buffer)
    if(buffer)
    {
       fread(buffer,1,size-1,f);
       buffer[size-1] = 0;
       while(substring foound in the buffer)
       {
          do your action
       }
       free(buffer)
    }
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  5. #5
    Registered User
    Join Date
    Nov 2006
    Posts
    19
    is "substring found in buffer" supposed to refer to me looking for the tag <QUAL>? If so, how do I look for <QUAL>? Thanks again for your help

  6. #6
    Registered User
    Join Date
    Apr 2006
    Posts
    58

    Re: Search through HTML

    Is the entire file a in XML format or HTML format? If you are trying to parse an XML file with 'C' code, try using the Expat parser...http://expat.sourceforge.net/

    Sam

  7. #7
    Registered User
    Join Date
    Aug 2005
    Location
    Austria
    Posts
    1,990
    strstr() might be a good function to scan a buffer for a string.
    But be aware that if you use fread to read a fixed size of text into a buffer the substring you are looking for might not be completely in the buffer. you will have to implement some kind of "moving window".
    Kurt

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Logical errors with seach function
    By Taka in forum C Programming
    Replies: 4
    Last Post: 09-18-2006, 05:20 AM
  2. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  3. Tutorial review
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 11
    Last Post: 03-22-2004, 09:40 PM
  4. Request for comments
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-02-2004, 10:33 AM