Hi guys,
I've been asked to write a program to obtain the "view source" contents of a webpage...
I've just looked up a few nteworking calls......but i'm still a little baffled.....
can somebody guide me through this ????
Hi guys,
I've been asked to write a program to obtain the "view source" contents of a webpage...
I've just looked up a few nteworking calls......but i'm still a little baffled.....
can somebody guide me through this ????
Nothing to do with HTML... at all.
* HTTP/1.1: Request
* Beej's Guide to Network Programming
Or take the easy road with a library like cURL and libcurl
Hi...I just checked up Beej's guide to networking...but i still doesnt give me any information about displaying the view source of the html........
C programming resources:
GNU C Function and Macro Index -- glibc reference manual
The C Book -- nice online learner guide
Current ISO draft standard
CCAN -- new CPAN like open source library repository
3 (different) GNU debugger tutorials: #1 -- #2 -- #3
cpwiki -- our wiki on sourceforge
I read through the HTTP 1.1 and and Beej's guide to networking,but i still cant come up with a complete program to sum it up......sorry but i'm quite a beginner to C.....so i would require ur help......
Hi Sir,
I saw the GET method and checked out the various networking calls from Beej's Guide To Networking......I want to know how to write a complete C program embedding the GET method and using the various calls.....
> sorry but i'm quite a beginner to C
Spend a few months learning the core language then.
Writing a useful chunk of the HTTP client program isn't likely in much less than 1000 lines of pretty intense code. If you're only just past "hello world", it's not for you yet.
CURL is by far the easiest way forward for you at the moment.
You can get the source code for CURL, but don't bank on being able to understand much of it quickly.
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper.
Indeed... Displaying the HTML source means you're just going to display the raw data you get back. So what you need to do is connect to port 80 with a socket, send the appropriate HTTP request, read in the data you get back, and display everything after the first blank line (i.e. the first occurance of "\r\n\r\n") - which will be the HTML file.Nothing to do with HTML... at all.
If any of that doesn't make sense to you, then you either need to read the aforementioned tutorials more thoroughly, or as Salem suggested - continue learning basic C before attempting this project.
It's not quite that easy. The HTTP data can be encoded which means you need to decode it before displaying it. This means you have to parse through the HTTP headers before looking at the data....and display everything after the first blank line (i.e. the first occurance of "\r\n\r\n") - which will be the HTML file.
As Salem mentioned, this is not a trivial task. You need to spend some time learning C and network programming before you dive into a project like this.
That is somewhat of an exaggeration. Pretty sure you can do something simple in less than a hundred.
The hard part is using the socket/networking API. After that, vis. "embedding the GET method", you just send a string, eg:
The HTTP header that comes back is plain text. You don't have to write them for a client, you just have to interpret them, here's a place to start:Code:sprintf(message,"GET /%s HTTP/1.0\r\n\r\n",image.path);
List of HTTP headers - Wikipedia, the free encyclopedia
You might want to look at this too:
http://www.intergate.com/~halfcountp...grabimage.html
I wrote that as an exercise; it's for *nix systems but a windows version will be pretty similar methinks. It works like this:
grabimage ww.somewhere.com/path/picture.jpg
to copy an image off the web into a local file. That means contacting the server, sending a GET request, and parsing the response including the HTTP header. Dealing with a web page is *exactly* the same thing, except in the place of binary image data there is HTML.
Last edited by MK27; 07-21-2009 at 06:59 AM.
C programming resources:
GNU C Function and Macro Index -- glibc reference manual
The C Book -- nice online learner guide
Current ISO draft standard
CCAN -- new CPAN like open source library repository
3 (different) GNU debugger tutorials: #1 -- #2 -- #3
cpwiki -- our wiki on sourceforge
Thanks so much, for the advice guys......I've got a faint idea of it now...I'm however looking for a program that has implemented the GET method of HTTP.....can someone post that for me.....
C programming resources:
GNU C Function and Macro Index -- glibc reference manual
The C Book -- nice online learner guide
Current ISO draft standard
CCAN -- new CPAN like open source library repository
3 (different) GNU debugger tutorials: #1 -- #2 -- #3
cpwiki -- our wiki on sourceforge
This is a one liner in biterscripting ( http://www.biterscripting.com ) . Assume you want to view source of "http://www.something.com/somepage.someextension".I've been asked to write a program to obtain the "view source" contents of a webpage...
The above command will show you the source for the page.Code:cat "http://www.something.com/somepage.someextension"
The above will save the source to file X.txt.Code:cat "http://www.something.com/somepage.someextension" > "X.txt"
The above will save the source in file X.txt, then open that file for viewing in a separate window.Code:cat "http://www.something.com/somepage.someextension" > "X.txt" system start "X.txt"
You will see the exact same source that a web browser will show.
Sen
Thanks so much Sen,
I guess thats another approach to this issue......however i'm kinda working on a LINUX SERVER......and i'm looking forward to writing a program in C to achieve this....