C Board  

Go Back   C Board > General Programming Boards > Networking/Device Communication

Reply
 
LinkBack Thread Tools Display Modes
Old 12-21-2006, 08:18 PM   #1
User
 
Join Date: Jan 2006
Location: Canada
Posts: 496
Need help with internet socket program

Hello, I'm still a newbie at this network socket stuff, so don't get mad at me if this is a simple problem.

I'm trying to retrieve HTML source from a webpage, and so I use basic network sockets as described in Beej's guide. If you want, I'll post it here, but the code seems fine to me. The problem is that when I try to retrieve a small webpage, it works, but larger ones get cut off. For example, if I try to load Google.com, here is my output from my program:
Code:
Data recieved:
HTTP/1.1 200 OK

Cache-Control: private

Content-Type: text/html

Set-Cookie: PREF=ID=5dbd2c69968fe1ba:TM=1166753338:LM=1166753338:S=f46he7TFtBY5uZrX; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.ca

Server: GWS/2.1

Transfer-Encoding: chunked

Date: Fri, 22 Dec 2006 02:08:58 GMT



b5b

<html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title><style><!--
body,td,a,p,.h{font-family:arial,sans-serif}
.h{font-size:20px}
.h{color:#3366cc}
.q{color:#00c}
--></style>
<script defer>
<!--
function sf(){document.f.q.focus();}
// -->
</script>
</head>
.
.
.
</script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><b>Web</b>&nbsp;&nbsp;&nbsp;&nbsp;<a class=q href="http://images.g
At first, I thought this might be due to a too small input buffer. But increasing the buffer yielded no changes. It seems as though all the data can't fit in a single packet, but if I try doing recv() multiple times, I just get the result posted above over and over again, instead of getting the remaining data. I've Googled and can't seem to find the answer; please help!

Thanks in advance.
joeprogrammer is offline   Reply With Quote
Old 12-21-2006, 10:51 PM   #2
int x = *((int *) NULL);
 
Cactus_Hugger's Avatar
 
Join Date: Jul 2003
Location: Banks of the River Styx
Posts: 902
Likely the data won't fit in a single packet, and your should probably be calling recv() in a loop. (It would be helpful to see some code to know exactly what's happening.) The most common error is ignoring the return value of recv() - recv() returns how many bytes it has but into your buffer. (Which may not equal how many you told it to.) Something like:
Code:
while(1)
{
   ret = recv(my_socket, buffer, buffer_size, 0);
   if(ret == 0) break; // All done.
   if(ret < 0) break; // Error.
   // Otherwise, process. buffer contains ret bytes of data.
   // (If you want to printf() buffer, do a buffer[ret] = 0
   //  (And be sure buffer is at least buffer_size + 1 bytes, if you do.)
}
Also, not to deter you from your exploration of socket programming, but notice how your above example mentions chunked encoding. It's not the only encoding out there, and if you don't decode the data, you'll get weird things. (Like broken Jpegs, in my case - frustrated me for quite a while.) If you're going to do a lot with just HTTP, look into libcurl - it'll keep you from reinventing wheels.
__________________
long time; /* know C? */
Unprecedented performance: Nothing ever ran this slow before.
Any sufficiently advanced bug is indistinguishable from a feature.
Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
The best way to accelerate an IBM is at 9.8 m/s/s.
recursion (re - cur' - zhun) n. 1. (see recursion)
Cactus_Hugger is offline   Reply With Quote
Old 12-23-2006, 07:11 PM   #3
User
 
Join Date: Jan 2006
Location: Canada
Posts: 496
Thanks for the tip. I've tried that already, but what happens is I get an endless loop. I guess it's time you better look at my code, so here it is:
Code:
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <fcntl.h>
#include <unistd.h>

#include <iostream>
using std::cin;
using std::cout;
using std::string;
using std::endl;

#define DEST_IP		"74.52.33.82"
#define DEST_ADDR   "google.com"
#define DEST_PORT	80

#define BUFFER_SIZE 4000

char *lookup_host(string name);
int open_socket();
int close_socket(int sockfd);
int send_packet(int sockfd, string data);
int recieve_packet(int sockfd, string &data);

char *lookup_host(char *name) {

	struct hostent *host;
	string ip;
	
	if ((host=gethostbyname(name)) == NULL) {
		cout << "Could not look up host.\n";
		exit(1);
	}
	
	ip = inet_ntoa(*((struct in_addr*) host->h_addr));
	return (char *)ip.c_str();
}

int open_socket() {

	int sockfd;
	struct sockaddr_in dest_addr;

	sockfd = socket(PF_INET, SOCK_STREAM, 0);			// sock file descriptor
	
	dest_addr.sin_family = AF_INET;						// host byte order
	dest_addr.sin_port = htons(DEST_PORT);				// convert destination port to host byte order
	dest_addr.sin_addr.s_addr = inet_addr(lookup_host(DEST_ADDR));		// convert IP address to long type
	memset(&(dest_addr.sin_zero), '\0', 8);				// clear out junk in rest of struct

	// connect to host
	if (connect(sockfd, (struct sockaddr *) &dest_addr, sizeof(struct sockaddr)) == -1) {
	
		cout << "Could not connect to host (" << DEST_IP << ", port " << DEST_PORT << ").\n";
		exit(1);
	}
	
	return sockfd;
}

int close_socket(int sockfd) {

	return close(sockfd);
}

int send_packet(int sockfd, string data) {
	
	int total=0;
	int length = data.length();
	int bytesleft = data.length();
	int n;
	
	while (total < length) {
		
		n = send(sockfd, data.c_str()+total, bytesleft, 0);
		if (n == -1) break;
		total += n;
		bytesleft -= n;
	}
	
	if (n == -1) {
		cout << "Connection closed by host.\n";
		exit(1);
	}

	return 0;
}

int recieve_packet(int sockfd, string &data) {

	int bytes_recieved;
	char buf[BUFFER_SIZE] = {'\0'};
	data=buf;
	
	while (1) {
		bytes_recieved = recv(sockfd, buf, BUFFER_SIZE, 0);
		if (bytes_recieved == 0) break;
		if (bytes_recieved < 0) break;
		data+=buf;
	}
		
	return 0;
}

int main() {

	string request = "GET / HTTP/1.1\nHost: www.google.ca\n\n";
	string response;
	char buffer[BUFFER_SIZE] = {'\0'};
	int bytes_recieved;
	
	int sockfd = open_socket();	// create new sending socket
	send_packet(sockfd, request);
	recieve_packet(sockfd, response);
	close_socket(sockfd);
	
	cout << "Data recieved:\n";
	cout << response << endl;
	
	return 0;
}
I also liked your suggestion about using libcurl, however I somehow couldn't get curl functions to resolve when I compiled it. The only libraries produced when compiling libcurl on Mac OS X were libcurl.3.dylib and libcurl.3.la, and adding these to the Xcode project didn't seem to resolve it. So I'm still trying this method, although it would be nice if I could get libcurl working.

Thanks.

Last edited by joeprogrammer; 12-23-2006 at 07:15 PM.
joeprogrammer is offline   Reply With Quote
Old 12-23-2006, 08:37 PM   #4
int x = *((int *) NULL);
 
Cactus_Hugger's Avatar
 
Join Date: Jul 2003
Location: Banks of the River Styx
Posts: 902
Ah.
First: HTTP headers are separated by return and a newline: "\r\n" in C. (or C++) (Although google seems to respond either way.)
Second: HTTP 1.1 is by default a persistant ("Keep-Alive") connection. You could send another request after you're done, so google won't immediately close the connection. (Though I would have thought that if you waited long enough, google would eventually terminate it for you, and your original would, at that point, dump the results. (But it could be a several minute wait... nothing desired.))
You can either interpret the data as it comes, de-encoding it to see when you've got it all or you can specify "Connection: Close" as a header, which will (should) cause the responder to close the connection after the response. So, the only change you should need:
Code:
string request = "GET / HTTP/1.1\r\nHost: www.google.ca\r\nConnection: Close\r\n\r\n";
Sorry to hear you're having trouble with libcurl - I know nothing about Macs myself. Good luck! (I use libcurl whenever HTTP and C come together in my mind...)

(If you're interested: HTTP's RFC, and relevant sections: General message format and Persistant connectsion)
__________________
long time; /* know C? */
Unprecedented performance: Nothing ever ran this slow before.
Any sufficiently advanced bug is indistinguishable from a feature.
Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
The best way to accelerate an IBM is at 9.8 m/s/s.
recursion (re - cur' - zhun) n. 1. (see recursion)

Last edited by Cactus_Hugger; 12-23-2006 at 08:43 PM.
Cactus_Hugger is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
socket programming question, closing sockets... ursula Networking/Device Communication 2 05-31-2009 05:17 PM
BOOKKEEPING PROGRAM, need help! yabud C Programming 3 11-16-2006 11:17 PM
Getting a C++ program to type text into a Windows box, i.e. Internet Explorer address Knowledge8069 Windows Programming 1 12-17-2002 01:06 AM
A socket program ? Zahl Windows Programming 4 11-18-2002 02:48 PM


All times are GMT -6. The time now is 03:04 PM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22