Parsing HTML files

**slcjoey** · 08-28-2005

Can anyone suggest a good method to parse html files?

I started out using just a linked list with each node containing a vector of strings that contain the html tag and the data inside that tag.

Can anyone recommend a better way of doing it?

example:

<html><head><title>hello</title></head>

My linked list would have something like the following:

NODE1 -> vectorofstring[0] = "html" vectofstring[1] = "<head><title>hello</title>"

NODE2 -> vectorofstring[0] = "head" vector[1] = "<title>hello</title>";

NODE3 -> vector[0] = "title" vector[1] = "hello"

etc..

**Salem** · 08-28-2005

> Can anyone recommend a better way of doing it?
Well what are you going to do with it next?

Like most things, there isn't a "best" way, but there are some "better" and "worse" ways depending to some extent on what you're trying to achieve.

**slcjoey** · 08-28-2005

I'm just trying to write a generic class to parse all tags and their associated data so that in the future if I need to parse out certain data from an html page I could use this.

Mostly, I'm looking to write a little program to "play stocks" and see how well it does ;O

I'm not sure where I plan to get the data from at the moment, and I realize it woudl be much easier to just parse specoifically for the data I need, but I figured, as an exercise I would parse the whole page "generically" so that maybe I can use it in the future.

Thread: Parsing HTML files

Thread Tools

Search Thread

Display

Parsing HTML files

Similar Threads

fopen vs. _open (for BIG image files)

opening html files in application

Parsing a C source file.

any useful tips to increase speed when parsing files

Dos commands hehe