Thread: Parsing HTML files

    Parsing HTML files

    Can anyone suggest a good method to parse html files?

    I started out using just a linked list with each node containing a vector of strings that contain the html tag and the data inside that tag.

    Can anyone recommend a better way of doing it?



    My linked list would have something like the following:

    NODE1 -> vectorofstring[0] = "html" vectofstring[1] = "<head><title>hello</title>"

    NODE2 -> vectorofstring[0] = "head" vector[1] = "<title>hello</title>";

    NODE3 -> vector[0] = "title" vector[1] = "hello"


    > Can anyone recommend a better way of doing it?
    Well what are you going to do with it next?

    Like most things, there isn't a "best" way, but there are some "better" and "worse" ways depending to some extent on what you're trying to achieve.
    I'm just trying to write a generic class to parse all tags and their associated data so that in the future if I need to parse out certain data from an html page I could use this.

    Mostly, I'm looking to write a little program to "play stocks" and see how well it does ;O

    I'm not sure where I plan to get the data from at the moment, and I realize it woudl be much easier to just parse specoifically for the data I need, but I figured, as an exercise I would parse the whole page "generically" so that maybe I can use it in the future.

