Parsing HTML files
Can anyone suggest a good method to parse html files?
I started out using just a linked list with each node containing a vector of strings that contain the html tag and the data inside that tag.
Can anyone recommend a better way of doing it?
My linked list would have something like the following:
NODE1 -> vectorofstring = "html" vectofstring = "<head><title>hello</title>"
NODE2 -> vectorofstring = "head" vector = "<title>hello</title>";
NODE3 -> vector = "title" vector = "hello"
> Can anyone recommend a better way of doing it?
Well what are you going to do with it next?
Like most things, there isn't a "best" way, but there are some "better" and "worse" ways depending to some extent on what you're trying to achieve.
I'm just trying to write a generic class to parse all tags and their associated data so that in the future if I need to parse out certain data from an html page I could use this.
Mostly, I'm looking to write a little program to "play stocks" and see how well it does ;O
I'm not sure where I plan to get the data from at the moment, and I realize it woudl be much easier to just parse specoifically for the data I need, but I figured, as an exercise I would parse the whole page "generically" so that maybe I can use it in the future.