Help with HTML

**rakesh_01** · 07-14-2009

Hi guys,

I've been asked to write a program to obtain the "view source" contents of a webpage...
I've just looked up a few nteworking calls......but i'm still a little baffled.....
can somebody guide me through this ????

**zacs7** · 07-14-2009

Nothing to do with HTML... at all.

* HTTP/1.1: Request
* Beej's Guide to Network Programming

Or take the easy road with a library like cURL and libcurl

**rakesh_01** · 07-16-2009

Hi...I just checked up Beej's guide to networking...but i still doesnt give me any information about displaying the view source of the html........

**MK27** · 07-16-2009

Originally Posted by rakesh_01

Hi...I just checked up Beej's guide to networking...but i still doesnt give me any information about displaying the view source of the html........

After you read the tcp/ip header (which actually you don't have to with sockets), you read the HTTP header. The html begins after the first blank line in the HTTP header.

**rakesh_01** · 07-20-2009

I read through the HTTP 1.1 and and Beej's guide to networking,but i still cant come up with a complete program to sum it up......sorry but i'm quite a beginner to C.....so i would require ur help......

**rakesh_01** · 07-20-2009

Hi Sir,

I saw the GET method and checked out the various networking calls from Beej's Guide To Networking......I want to know how to write a complete C program embedding the GET method and using the various calls.....

**Salem** · 07-20-2009

> sorry but i'm quite a beginner to C
Spend a few months learning the core language then.

Writing a useful chunk of the HTTP client program isn't likely in much less than 1000 lines of pretty intense code. If you're only just past "hello world", it's not for you yet.

CURL is by far the easiest way forward for you at the moment.
You can get the source code for CURL, but don't bank on being able to understand much of it quickly.

**sean** · 07-20-2009

Nothing to do with HTML... at all.

Indeed... Displaying the HTML source means you're just going to display the raw data you get back. So what you need to do is connect to port 80 with a socket, send the appropriate HTTP request, read in the data you get back, and display everything after the first blank line (i.e. the first occurance of "\r\n\r\n") - which will be the HTML file.

If any of that doesn't make sense to you, then you either need to read the aforementioned tutorials more thoroughly, or as Salem suggested - continue learning basic C before attempting this project.

**bithub** · 07-20-2009

...and display everything after the first blank line (i.e. the first occurance of "\r\n\r\n") - which will be the HTML file.

It's not quite that easy. The HTTP data can be encoded which means you need to decode it before displaying it. This means you have to parse through the HTTP headers before looking at the data.

As Salem mentioned, this is not a trivial task. You need to spend some time learning C and network programming before you dive into a project like this.

**MK27** · 07-20-2009

Originally Posted by Salem

Writing a useful chunk of the HTTP client program isn't likely in much less than 1000 lines of pretty intense code.

That is somewhat of an exaggeration. Pretty sure you can do something simple in less than a hundred.

The hard part is using the socket/networking API. After that, vis. "embedding the GET method", you just send a string, eg:

Code:

sprintf(message,"GET /%s HTTP/1.0\r\n\r\n",image.path);

The HTTP header that comes back is plain text. You don't have to write them for a client, you just have to interpret them, here's a place to start:

List of HTTP headers - Wikipedia, the free encyclopedia

You might want to look at this too:

http://www.intergate.com/~halfcountp...grabimage.html

I wrote that as an exercise; it's for *nix systems but a windows version will be pretty similar methinks. It works like this:

grabimage ww.somewhere.com/path/picture.jpg

to copy an image off the web into a local file. That means contacting the server, sending a GET request, and parsing the response including the HTTP header. Dealing with a web page is *exactly* the same thing, except in the place of binary image data there is HTML.

**rakesh_01** · 07-21-2009

Thanks so much, for the advice guys......I've got a faint idea of it now...I'm however looking for a program that has implemented the GET method of HTTP.....can someone post that for me.....

**MK27** · 07-21-2009

Originally Posted by rakesh_01

Thanks so much, for the advice guys......I've got a faint idea of it now...I'm however looking for a program that has implemented the GET method of HTTP.....can someone post that for me.....

If you had actually read my last post, you might have noticed I explicitly did that.

**zacs7** · 07-21-2009

Originally Posted by MK27

If you had actually read my last post, you might have noticed I explicitly did that.

No, but you didn't read it for him

**SenHu** · 07-22-2009

I've been asked to write a program to obtain the "view source" contents of a webpage...

This is a one liner in biterscripting ( http://www.biterscripting.com ) . Assume you want to view source of "http://www.something.com/somepage.someextension".

Code:

cat "http://www.something.com/somepage.someextension"

The above command will show you the source for the page.

Code:

cat "http://www.something.com/somepage.someextension" > "X.txt"

The above will save the source to file X.txt.

Code:

cat "http://www.something.com/somepage.someextension" > "X.txt"
system start "X.txt"

The above will save the source in file X.txt, then open that file for viewing in a separate window.

You will see the exact same source that a web browser will show.

Sen

**rakesh_01** · 07-22-2009

Thanks so much Sen,

I guess thats another approach to this issue......however i'm kinda working on a LINUX SERVER......and i'm looking forward to writing a program in C to achieve this....

Thread: Help with HTML

Thread Tools

Search Thread

Display

Help with HTML

I've been asked to write a program to obtain the "view source" contents of a webpage.

Similar Threads

Please Help - C code creates dynamic HTML

Writing an HTML Preprocessor

Stacks, classes, HTML tags, and parsing.

Design + HTML