Thread: Need help with internet socket program

  1. #1
    User
    Join Date
    Jan 2006
    Location
    Canada
    Posts
    499

    Need help with internet socket program

    Hello, I'm still a newbie at this network socket stuff, so don't get mad at me if this is a simple problem.

    I'm trying to retrieve HTML source from a webpage, and so I use basic network sockets as described in Beej's guide. If you want, I'll post it here, but the code seems fine to me. The problem is that when I try to retrieve a small webpage, it works, but larger ones get cut off. For example, if I try to load Google.com, here is my output from my program:
    Code:
    Data recieved:
    HTTP/1.1 200 OK
    
    Cache-Control: private
    
    Content-Type: text/html
    
    Set-Cookie: PREF=ID=5dbd2c69968fe1ba:TM=1166753338:LM=1166753338:S=f46he7TFtBY5uZrX; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.ca
    
    Server: GWS/2.1
    
    Transfer-Encoding: chunked
    
    Date: Fri, 22 Dec 2006 02:08:58 GMT
    
    
    
    b5b
    
    <html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title><style><!--
    body,td,a,p,.h{font-family:arial,sans-serif}
    .h{font-size:20px}
    .h{color:#3366cc}
    .q{color:#00c}
    --></style>
    <script defer>
    <!--
    function sf(){document.f.q.focus();}
    // -->
    </script>
    </head>
    .
    .
    .
    </script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><b>Web</b>&nbsp;&nbsp;&nbsp;&nbsp;<a class=q href="http://images.g
    At first, I thought this might be due to a too small input buffer. But increasing the buffer yielded no changes. It seems as though all the data can't fit in a single packet, but if I try doing recv() multiple times, I just get the result posted above over and over again, instead of getting the remaining data. I've Googled and can't seem to find the answer; please help!

    Thanks in advance.

  2. #2
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    Likely the data won't fit in a single packet, and your should probably be calling recv() in a loop. (It would be helpful to see some code to know exactly what's happening.) The most common error is ignoring the return value of recv() - recv() returns how many bytes it has but into your buffer. (Which may not equal how many you told it to.) Something like:
    Code:
    while(1)
    {
       ret = recv(my_socket, buffer, buffer_size, 0);
       if(ret == 0) break; // All done.
       if(ret < 0) break; // Error.
       // Otherwise, process. buffer contains ret bytes of data.
       // (If you want to printf() buffer, do a buffer[ret] = 0
       //  (And be sure buffer is at least buffer_size + 1 bytes, if you do.)
    }
    Also, not to deter you from your exploration of socket programming, but notice how your above example mentions chunked encoding. It's not the only encoding out there, and if you don't decode the data, you'll get weird things. (Like broken Jpegs, in my case - frustrated me for quite a while.) If you're going to do a lot with just HTTP, look into libcurl - it'll keep you from reinventing wheels.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  3. #3
    User
    Join Date
    Jan 2006
    Location
    Canada
    Posts
    499
    Thanks for the tip. I've tried that already, but what happens is I get an endless loop. I guess it's time you better look at my code, so here it is:
    Code:
    #include <sys/types.h>
    #include <sys/socket.h>
    #include <netinet/in.h>
    #include <arpa/inet.h>
    #include <netdb.h>
    #include <fcntl.h>
    #include <unistd.h>
    
    #include <iostream>
    using std::cin;
    using std::cout;
    using std::string;
    using std::endl;
    
    #define DEST_IP		"74.52.33.82"
    #define DEST_ADDR   "google.com"
    #define DEST_PORT	80
    
    #define BUFFER_SIZE 4000
    
    char *lookup_host(string name);
    int open_socket();
    int close_socket(int sockfd);
    int send_packet(int sockfd, string data);
    int recieve_packet(int sockfd, string &data);
    
    char *lookup_host(char *name) {
    
    	struct hostent *host;
    	string ip;
    	
    	if ((host=gethostbyname(name)) == NULL) {
    		cout << "Could not look up host.\n";
    		exit(1);
    	}
    	
    	ip = inet_ntoa(*((struct in_addr*) host->h_addr));
    	return (char *)ip.c_str();
    }
    
    int open_socket() {
    
    	int sockfd;
    	struct sockaddr_in dest_addr;
    
    	sockfd = socket(PF_INET, SOCK_STREAM, 0);			// sock file descriptor
    	
    	dest_addr.sin_family = AF_INET;						// host byte order
    	dest_addr.sin_port = htons(DEST_PORT);				// convert destination port to host byte order
    	dest_addr.sin_addr.s_addr = inet_addr(lookup_host(DEST_ADDR));		// convert IP address to long type
    	memset(&(dest_addr.sin_zero), '\0', 8);				// clear out junk in rest of struct
    
    	// connect to host
    	if (connect(sockfd, (struct sockaddr *) &dest_addr, sizeof(struct sockaddr)) == -1) {
    	
    		cout << "Could not connect to host (" << DEST_IP << ", port " << DEST_PORT << ").\n";
    		exit(1);
    	}
    	
    	return sockfd;
    }
    
    int close_socket(int sockfd) {
    
    	return close(sockfd);
    }
    
    int send_packet(int sockfd, string data) {
    	
    	int total=0;
    	int length = data.length();
    	int bytesleft = data.length();
    	int n;
    	
    	while (total < length) {
    		
    		n = send(sockfd, data.c_str()+total, bytesleft, 0);
    		if (n == -1) break;
    		total += n;
    		bytesleft -= n;
    	}
    	
    	if (n == -1) {
    		cout << "Connection closed by host.\n";
    		exit(1);
    	}
    
    	return 0;
    }
    
    int recieve_packet(int sockfd, string &data) {
    
    	int bytes_recieved;
    	char buf[BUFFER_SIZE] = {'\0'};
    	data=buf;
    	
    	while (1) {
    		bytes_recieved = recv(sockfd, buf, BUFFER_SIZE, 0);
    		if (bytes_recieved == 0) break;
    		if (bytes_recieved < 0) break;
    		data+=buf;
    	}
    		
    	return 0;
    }
    
    int main() {
    
    	string request = "GET / HTTP/1.1\nHost: www.google.ca\n\n";
    	string response;
    	char buffer[BUFFER_SIZE] = {'\0'};
    	int bytes_recieved;
    	
    	int sockfd = open_socket();	// create new sending socket
    	send_packet(sockfd, request);
    	recieve_packet(sockfd, response);
    	close_socket(sockfd);
    	
    	cout << "Data recieved:\n";
    	cout << response << endl;
    	
    	return 0;
    }
    I also liked your suggestion about using libcurl, however I somehow couldn't get curl functions to resolve when I compiled it. The only libraries produced when compiling libcurl on Mac OS X were libcurl.3.dylib and libcurl.3.la, and adding these to the Xcode project didn't seem to resolve it. So I'm still trying this method, although it would be nice if I could get libcurl working.

    Thanks.
    Last edited by joeprogrammer; 12-23-2006 at 07:15 PM.

  4. #4
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    Ah.
    First: HTTP headers are separated by return and a newline: "\r\n" in C. (or C++) (Although google seems to respond either way.)
    Second: HTTP 1.1 is by default a persistant ("Keep-Alive") connection. You could send another request after you're done, so google won't immediately close the connection. (Though I would have thought that if you waited long enough, google would eventually terminate it for you, and your original would, at that point, dump the results. (But it could be a several minute wait... nothing desired.))
    You can either interpret the data as it comes, de-encoding it to see when you've got it all or you can specify "Connection: Close" as a header, which will (should) cause the responder to close the connection after the response. So, the only change you should need:
    Code:
    string request = "GET / HTTP/1.1\r\nHost: www.google.ca\r\nConnection: Close\r\n\r\n";
    Sorry to hear you're having trouble with libcurl - I know nothing about Macs myself. Good luck! (I use libcurl whenever HTTP and C come together in my mind...)

    (If you're interested: HTTP's RFC, and relevant sections: General message format and Persistant connectsion)
    Last edited by Cactus_Hugger; 12-23-2006 at 08:43 PM.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. socket programming question, closing sockets...
    By ursula in forum Networking/Device Communication
    Replies: 2
    Last Post: 05-31-2009, 05:17 PM
  2. BOOKKEEPING PROGRAM, need help!
    By yabud in forum C Programming
    Replies: 3
    Last Post: 11-16-2006, 11:17 PM
  3. Replies: 1
    Last Post: 12-17-2002, 01:06 AM
  4. A socket program ?
    By Zahl in forum Windows Programming
    Replies: 4
    Last Post: 11-18-2002, 02:48 PM