-
Search through HTML
I have a large file of HTML type format. I've included a sample of it below. I'm trying to find the contents of whats between a specific tag. In this case, I'm trying to find and store the contents of whats between <QUAL> and </QUAL>. I want to do this as many times as <QUAL> tags appear. Can someone steer me in the right direction of what kind of code I have to write to do this.
<TUV xml:lang='en-US'>Interfaces</TUV>
</NAME>
<DESC>
<TUV xml:lang='en-US'>(System category)</TUV>
</DESC>
<QUAL>COM.INTERFACES</QUAL>
</ATTRCAT>
<ATTRCAT id='_0x01804c20' usage='sys'>
<NAME>
<TUV xml:lang='en-US'>Initialization</TUV>
</NAME>
<DESC>
Thanks
-
Scan through the stream looking for your tag. When you find your tag, read through the end of the tag into a large array (one that would be more than large enough to hold the string). Terminate the array with a '\0'. There is your string!
-
Follow up
Thanks for your help. What code do I write to scan through the stream looking for the tag? I'm a mediocre programmer so I'm not sure what I would write to do that
-
for example
Code:
get size of the file
size ++;
unsigned char* buffer = malloc(size * sizeof *buffer)
if(buffer)
{
fread(buffer,1,size-1,f);
buffer[size-1] = 0;
while(substring foound in the buffer)
{
do your action
}
free(buffer)
}
-
is "substring found in buffer" supposed to refer to me looking for the tag <QUAL>? If so, how do I look for <QUAL>? Thanks again for your help
-
Re: Search through HTML
Is the entire file a in XML format or HTML format? If you are trying to parse an XML file with 'C' code, try using the Expat parser...http://expat.sourceforge.net/
Sam
-
strstr() might be a good function to scan a buffer for a string.
But be aware that if you use fread to read a fixed size of text into a buffer the substring you are looking for might not be completely in the buffer. you will have to implement some kind of "moving window".
Kurt