Parsing HTML files

This is a discussion on Parsing HTML files within the C++ Programming forums, part of the General Programming Boards category; Can anyone suggest a good method to parse html files? I started out using just a linked list with each ...

  1. #1
    Registered User
    Join Date
    Mar 2005
    Posts
    22

    Parsing HTML files

    Can anyone suggest a good method to parse html files?

    I started out using just a linked list with each node containing a vector of strings that contain the html tag and the data inside that tag.

    Can anyone recommend a better way of doing it?

    example:

    <html><head><title>hello</title></head>

    My linked list would have something like the following:

    NODE1 -> vectorofstring[0] = "html" vectofstring[1] = "<head><title>hello</title>"

    NODE2 -> vectorofstring[0] = "head" vector[1] = "<title>hello</title>";

    NODE3 -> vector[0] = "title" vector[1] = "hello"

    etc..

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,592
    > Can anyone recommend a better way of doing it?
    Well what are you going to do with it next?

    Like most things, there isn't a "best" way, but there are some "better" and "worse" ways depending to some extent on what you're trying to achieve.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    Registered User
    Join Date
    Mar 2005
    Posts
    22
    I'm just trying to write a generic class to parse all tags and their associated data so that in the future if I need to parse out certain data from an html page I could use this.

    Mostly, I'm looking to write a little program to "play stocks" and see how well it does ;O

    I'm not sure where I plan to get the data from at the moment, and I realize it woudl be much easier to just parse specoifically for the data I need, but I figured, as an exercise I would parse the whole page "generically" so that maybe I can use it in the future.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. fopen vs. _open (for BIG image files)
    By reversaflex in forum C Programming
    Replies: 3
    Last Post: 04-01-2007, 12:52 AM
  2. opening html files in application
    By Ripper1 in forum Windows Programming
    Replies: 8
    Last Post: 04-27-2003, 05:06 AM
  3. Parsing a C source file.
    By RoshanX in forum C Programming
    Replies: 1
    Last Post: 04-23-2003, 09:12 PM
  4. any useful tips to increase speed when parsing files
    By Shadow12345 in forum C++ Programming
    Replies: 2
    Last Post: 01-18-2003, 04:52 PM
  5. Dos commands hehe
    By Carp in forum A Brief History of Cprogramming.com
    Replies: 2
    Last Post: 01-17-2003, 01:51 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21