Thread: Trying to grab the HTML from a Page...

  1. #1
    Registered User
    Join Date
    Sep 2006
    Posts
    7

    Trying to grab the HTML from a Page...

    Ya, I am simply trying to grab the html from a web page. I've tried down loading and using cURL... this seems like a very powerful tool and bit confusing, I think I'm just making it too hard. Any suggestions... Thanks.

  2. #2
    Sanity is for the weak! beene's Avatar
    Join Date
    Jul 2006
    Posts
    321
    Why would you want to do that?

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Networking question, methinks, so moved to Networking/Device Communication.

    You probably could read a tutorial such as Beej's Guide to Network Programming for the basics.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    7
    I'm trying to update some records, with information provided from a county web site. Ya, they put me in charge of doing this by hand, so I'm trying to write a program to to this for me... : ). that's why we have computers right..

  5. #5
    Lean Mean Coding Machine KONI's Avatar
    Join Date
    Mar 2007
    Location
    Luxembourg, Europe
    Posts
    444
    I would use a more adequate programming language, where such things are a little easier, such as Perl or PHP.

  6. #6
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    curl is extremely useful when working with webpages... if you can get it working with PHP (I'm 90% sure there is a curl extension for php) then more power to you. Otherwise, C++ strings and curl still make a great pair.

    Raw sockets (with PHP or C/C++) are more trouble than they're usually worth, dealing with premature data (some servers (IIS, I think) send something like 200 Continues while the request is being made...) parsing out headers, and dechunking. And only then do you get to parse the data...

    To the OP: What exactly are you having trouble with? Any specific errors? What OS/compiler?
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  7. #7
    Lean Mean Coding Machine KONI's Avatar
    Join Date
    Mar 2007
    Location
    Luxembourg, Europe
    Posts
    444
    Quote Originally Posted by Cactus_Hugger View Post
    curl is extremely useful when working with webpages... if you can get it working with PHP (I'm 90% sure there is a curl extension for php) then more power to you. Otherwise, C++ strings and curl still make a great pair.

    Raw sockets (with PHP or C/C++) are more trouble than they're usually worth, dealing with premature data (some servers (IIS, I think) send something like 200 Continues while the request is being made...) parsing out headers, and dechunking. And only then do you get to parse the data...

    To the OP: What exactly are you having trouble with? Any specific errors? What OS/compiler?
    In PHP, you can simply write:
    Code:
    $handle = fopen("http://www.example.com/", "r");
    and then use it just as you would a text file.

    You could also write:
    Code:
       $viart_xml = fsockopen("www.viart.com", 80, $errno, $errstr, 12);
    
       fputs($viart_xml, "GET /viart_shop.xml HTTP/1.0\r\n");
       fputs($viart_xml, "Host: www.viart.com\r\n");
       fputs($viart_xml, "Referer: http://www.viart.com\r\n");
       fputs($viart_xml, "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\r\n\r\n");
    or even use the curl library.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Winsock HTML Page Source Dump...
    By blake_ in forum Networking/Device Communication
    Replies: 36
    Last Post: 11-19-2007, 11:02 PM
  2. reading data from html page
    By cnu_sree in forum C Programming
    Replies: 7
    Last Post: 11-01-2007, 11:22 AM
  3. I need to open a web page from c++ and grab the html.
    By rloveless in forum C++ Programming
    Replies: 1
    Last Post: 09-28-2006, 04:12 PM
  4. HTML page split into files in c
    By Munisamy in forum C Programming
    Replies: 2
    Last Post: 02-21-2005, 05:58 AM
  5. Downloading HTML Files from Web Page
    By Unregistered in forum A Brief History of Cprogramming.com
    Replies: 13
    Last Post: 07-18-2002, 05:59 AM