Thread: Access HTML from a program.

  1. #1
    Registered User
    Join Date
    Mar 2009
    Posts
    76

    Access HTML from a program.

    How would I beable to change something on MY website from a program I made. (I know it's possible)

    Or how would I be able to GET data from a website.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    That would depend on what you actually wish to do, and where you are located in relation to the web-site.

    For example, a local website on your local machine, you can just open the file(s) and change them (by hand or by writing code).

    To modify a remote (not the same machine as the one you are using, and not "enough the same network to just mount the drive"), then you need to somehow remote-copy the files - you can use HTTP protocol and for example wget to "receive" the pages. Unfortunately, http doesn't (normally) allow you to update, so you still have the problem of copying the pages back. So, to actually MODIFY the content, you probably need something like ftp, remote copy (rcp) or secure remote copy (scp).

    Once you have a method of copying the files back and forth, the process of editing the files is the same.

    A third variant is to remote-login to the relevant machine and either run a program on the remote machine, or edit the files via some text-editor on the remote machine over the network. rlogin or ssh are unix/linux commands to do this. Remote-Desktop would work across PC's. [Obviously, assuming the machine is set up to accept such a connection]

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by azjherben View Post
    Or how would I be able to GET data from a website.
    If you want to send http requests from a C program, you have to use "inet sockets".
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #4
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    http doesn't (normally) allow you to update, so you still have the problem of copying the pages back. So, to actually MODIFY the content, you probably need something like ftp, remote copy (rcp) or secure remote copy (scp).
    Actually, out of the 6 standard HTTP actions, 3 of them are for modifying the content of a web page (DELETE, PUT, and POST). I'm assuming that is what the OP is talking about.

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by bithub View Post
    Actually, out of the 6 standard HTTP actions, 3 of them are for modifying the content of a web page (DELETE, PUT, and POST). I'm assuming that is what the OP is talking about.
    Ah, ok. You learn something new almost every day!

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by bithub View Post
    Actually, out of the 6 standard HTTP actions, 3 of them are for modifying the content of a web page (DELETE, PUT, and POST). I'm assuming that is what the OP is talking about.
    I don't think so. DELETE deletes a resource, PUT will add one, but POST does not modify a page as it exists on the server's HD, which is what the OP is talking about.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #7
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    I don't think so. DELETE deletes a resource, PUT will add one, but POST does not modify a page as it exists on the server's HD
    I view deleting a resource as modifying it. PUT will add or replace a resource. POST will create a new resource.

    does not modify a page as it exists on the server's HD, which is what the OP is talking about.
    How do you know what he is talking about? He said, "change something on MY website". Since most website data these days is stored in SQL databases, any of the 3 HTTP actions I mentioned will "change" something on a website since it will modify the resource representations in the database.

  8. #8
    Registered User
    Join Date
    Mar 2009
    Posts
    76
    I use Winsock (2) any tutorials?
    Or source code?

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Tutorials on doing what? There's been at least four suggestions on what you MAY want to do. You may need to explain what you are actually trying to do for us to give any meaningful advice.

    For example, is the web-site on a local or remote machine?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  10. #10
    The superhaterodyne twomers's Avatar
    Join Date
    Dec 2005
    Location
    Ireland
    Posts
    2,273
    libcurl -- cURL and libcurl

  11. #11
    Registered User
    Join Date
    Mar 2009
    Posts
    76
    I want to write a program in C++ (with Winsock (2?))
    That can get me googles HTML code.

    Internet explorer and other browers do it, so it IS possible.

  12. #12
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    How would I beable to change something on MY website
    That can get me googles HTML code.
    Well that's why we're confused - you're talking about two completely different things.

    If you want to write to a file on a server, a lot is going to depend on how your server is set up - there are easier ways than using HTTP, I'm sure. Maybe you could look into FTP, or a PHP script or something. It all depends on what you're trying to do and you really haven't given us any additiona ldetails.

    If you just want to read the HTML of a web page, you just use sockets, make a connection to a server (and you may have to do DNS queries, etc.. - unless winsock does that for you, but I doubt it. You could use the IP address, though), send an HTTP request for the specific resource, and what you get back will be HTML code inside an HTTP header.

    Code:
    GET / HTTP/1.1\r\nHost: www.google.com
    Code:
    *BUNCH OF HTTP*\r\n\r\n<html><head>..... </body></html>

  13. #13
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by azjherben View Post
    I want to write a program in C++ (with Winsock (2?))
    That can get me googles HTML code.

    Internet explorer and other browers do it, so it IS possible.
    Well, if it is done on computer, it was probably some software that did it.

    You probably want to do it with the ftp protocol.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  14. #14
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    You probably want to do it with the ftp protocol.
    Unless you know of a way to get FTP access to Google's webserver, I think HTTP is probably the proper protocol here.

    You can do as sean suggested and send the HTTP GET request string, and google will send you back the HTML. The problem is that Google will send it back as chunked encoded, so you will need to decode it. If you don't want to worry about writing the code to parse through the chunked encoding (it's not that difficult), you can always take twomer's suggestion, and just use libcurl. libcurl has done the work of implementing the HTTP protocol for you, so you just need to call some library functions to get the HTML content from a website.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Access Violation?
    By rwmarsh in forum C++ Programming
    Replies: 6
    Last Post: 05-04-2006, 10:56 AM
  2. I need a program to access a program..
    By willc0de4food in forum Windows Programming
    Replies: 0
    Last Post: 03-23-2006, 02:49 AM
  3. My program needs to read from a synchronized html file
    By istheman5 in forum Windows Programming
    Replies: 8
    Last Post: 11-30-2005, 04:51 PM
  4. Date program starts DOS's date
    By jrahhali in forum C++ Programming
    Replies: 1
    Last Post: 11-24-2003, 05:23 PM