Thread: Help with HTML

  1. #1
    Registered User
    Join Date
    Jul 2009
    Posts
    22

    Help with HTML

    Hi guys,

    I've been asked to write a program to obtain the "view source" contents of a webpage...
    I've just looked up a few nteworking calls......but i'm still a little baffled.....
    can somebody guide me through this ????

  2. #2
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Nothing to do with HTML... at all.

    * HTTP/1.1: Request
    * Beej's Guide to Network Programming

    Or take the easy road with a library like cURL and libcurl

  3. #3
    Registered User
    Join Date
    Jul 2009
    Posts
    22
    Hi...I just checked up Beej's guide to networking...but i still doesnt give me any information about displaying the view source of the html........

  4. #4
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by rakesh_01 View Post
    Hi...I just checked up Beej's guide to networking...but i still doesnt give me any information about displaying the view source of the html........
    After you read the tcp/ip header (which actually you don't have to with sockets), you read the HTTP header. The html begins after the first blank line in the HTTP header.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #5
    Registered User
    Join Date
    Jul 2009
    Posts
    22
    I read through the HTTP 1.1 and and Beej's guide to networking,but i still cant come up with a complete program to sum it up......sorry but i'm quite a beginner to C.....so i would require ur help......

  6. #6
    Registered User
    Join Date
    Jul 2009
    Posts
    22
    Hi Sir,

    I saw the GET method and checked out the various networking calls from Beej's Guide To Networking......I want to know how to write a complete C program embedding the GET method and using the various calls.....

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > sorry but i'm quite a beginner to C
    Spend a few months learning the core language then.

    Writing a useful chunk of the HTTP client program isn't likely in much less than 1000 lines of pretty intense code. If you're only just past "hello world", it's not for you yet.

    CURL is by far the easiest way forward for you at the moment.
    You can get the source code for CURL, but don't bank on being able to understand much of it quickly.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #8
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    Nothing to do with HTML... at all.
    Indeed... Displaying the HTML source means you're just going to display the raw data you get back. So what you need to do is connect to port 80 with a socket, send the appropriate HTTP request, read in the data you get back, and display everything after the first blank line (i.e. the first occurance of "\r\n\r\n") - which will be the HTML file.

    If any of that doesn't make sense to you, then you either need to read the aforementioned tutorials more thoroughly, or as Salem suggested - continue learning basic C before attempting this project.

  9. #9
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    ...and display everything after the first blank line (i.e. the first occurance of "\r\n\r\n") - which will be the HTML file.
    It's not quite that easy. The HTTP data can be encoded which means you need to decode it before displaying it. This means you have to parse through the HTTP headers before looking at the data.

    As Salem mentioned, this is not a trivial task. You need to spend some time learning C and network programming before you dive into a project like this.

  10. #10
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Salem View Post
    Writing a useful chunk of the HTTP client program isn't likely in much less than 1000 lines of pretty intense code.
    That is somewhat of an exaggeration. Pretty sure you can do something simple in less than a hundred.

    The hard part is using the socket/networking API. After that, vis. "embedding the GET method", you just send a string, eg:
    Code:
    sprintf(message,"GET /%s HTTP/1.0\r\n\r\n",image.path);
    The HTTP header that comes back is plain text. You don't have to write them for a client, you just have to interpret them, here's a place to start:

    List of HTTP headers - Wikipedia, the free encyclopedia

    You might want to look at this too:

    http://www.intergate.com/~halfcountp...grabimage.html

    I wrote that as an exercise; it's for *nix systems but a windows version will be pretty similar methinks. It works like this:

    grabimage ww.somewhere.com/path/picture.jpg

    to copy an image off the web into a local file. That means contacting the server, sending a GET request, and parsing the response including the HTTP header. Dealing with a web page is *exactly* the same thing, except in the place of binary image data there is HTML.
    Last edited by MK27; 07-21-2009 at 06:59 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #11
    Registered User
    Join Date
    Jul 2009
    Posts
    22
    Thanks so much, for the advice guys......I've got a faint idea of it now...I'm however looking for a program that has implemented the GET method of HTTP.....can someone post that for me.....

  12. #12
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by rakesh_01 View Post
    Thanks so much, for the advice guys......I've got a faint idea of it now...I'm however looking for a program that has implemented the GET method of HTTP.....can someone post that for me.....
    If you had actually read my last post, you might have noticed I explicitly did that.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  13. #13
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Quote Originally Posted by MK27 View Post
    If you had actually read my last post, you might have noticed I explicitly did that.
    No, but you didn't read it for him

  14. #14
    Registered User
    Join Date
    Jul 2009
    Posts
    2

    I've been asked to write a program to obtain the "view source" contents of a webpage.

    I've been asked to write a program to obtain the "view source" contents of a webpage...
    This is a one liner in biterscripting ( http://www.biterscripting.com ) . Assume you want to view source of "http://www.something.com/somepage.someextension".

    Code:
    cat "http://www.something.com/somepage.someextension"
    The above command will show you the source for the page.

    Code:
    cat "http://www.something.com/somepage.someextension" > "X.txt"
    The above will save the source to file X.txt.

    Code:
    cat "http://www.something.com/somepage.someextension" > "X.txt"
    system start "X.txt"
    The above will save the source in file X.txt, then open that file for viewing in a separate window.

    You will see the exact same source that a web browser will show.


    Sen

  15. #15
    Registered User
    Join Date
    Jul 2009
    Posts
    22
    Thanks so much Sen,

    I guess thats another approach to this issue......however i'm kinda working on a LINUX SERVER......and i'm looking forward to writing a program in C to achieve this....

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Please Help - C code creates dynamic HTML
    By Christie2008 in forum C Programming
    Replies: 19
    Last Post: 04-02-2008, 07:36 PM
  2. Writing an HTML Preprocessor
    By thetinman in forum C++ Programming
    Replies: 1
    Last Post: 09-17-2007, 08:01 AM
  3. Stacks, classes, HTML tags, and parsing.
    By Shinobi-wan in forum C++ Programming
    Replies: 5
    Last Post: 10-01-2003, 05:50 PM
  4. Design + HTML
    By orbitz in forum C Programming
    Replies: 8
    Last Post: 11-21-2002, 06:32 AM