![]() |
| | #1 |
| User Join Date: Jan 2006 Location: Canada
Posts: 496
| Need help with internet socket program I'm trying to retrieve HTML source from a webpage, and so I use basic network sockets as described in Beej's guide. If you want, I'll post it here, but the code seems fine to me. The problem is that when I try to retrieve a small webpage, it works, but larger ones get cut off. For example, if I try to load Google.com, here is my output from my program: Code: Data recieved:
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html
Set-Cookie: PREF=ID=5dbd2c69968fe1ba:TM=1166753338:LM=1166753338:S=f46he7TFtBY5uZrX; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.ca
Server: GWS/2.1
Transfer-Encoding: chunked
Date: Fri, 22 Dec 2006 02:08:58 GMT
b5b
<html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title><style><!--
body,td,a,p,.h{font-family:arial,sans-serif}
.h{font-size:20px}
.h{color:#3366cc}
.q{color:#00c}
--></style>
<script defer>
<!--
function sf(){document.f.q.focus();}
// -->
</script>
</head>
.
.
.
</script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><b>Web</b> <a class=q href="http://images.g
Thanks in advance. |
| joeprogrammer is offline | |
| | #2 |
| int x = *((int *) NULL); Join Date: Jul 2003 Location: Banks of the River Styx
Posts: 902
| Likely the data won't fit in a single packet, and your should probably be calling recv() in a loop. (It would be helpful to see some code to know exactly what's happening.) The most common error is ignoring the return value of recv() - recv() returns how many bytes it has but into your buffer. (Which may not equal how many you told it to.) Something like: Code: while(1)
{
ret = recv(my_socket, buffer, buffer_size, 0);
if(ret == 0) break; // All done.
if(ret < 0) break; // Error.
// Otherwise, process. buffer contains ret bytes of data.
// (If you want to printf() buffer, do a buffer[ret] = 0
// (And be sure buffer is at least buffer_size + 1 bytes, if you do.)
}
__________________ long time; /* know C? */ Unprecedented performance: Nothing ever ran this slow before. Any sufficiently advanced bug is indistinguishable from a feature. Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31. The best way to accelerate an IBM is at 9.8 m/s/s. recursion (re - cur' - zhun) n. 1. (see recursion) |
| Cactus_Hugger is offline | |
| | #3 |
| User Join Date: Jan 2006 Location: Canada
Posts: 496
| Thanks for the tip. I've tried that already, but what happens is I get an endless loop. I guess it's time you better look at my code, so here it is: Code: #include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <fcntl.h>
#include <unistd.h>
#include <iostream>
using std::cin;
using std::cout;
using std::string;
using std::endl;
#define DEST_IP "74.52.33.82"
#define DEST_ADDR "google.com"
#define DEST_PORT 80
#define BUFFER_SIZE 4000
char *lookup_host(string name);
int open_socket();
int close_socket(int sockfd);
int send_packet(int sockfd, string data);
int recieve_packet(int sockfd, string &data);
char *lookup_host(char *name) {
struct hostent *host;
string ip;
if ((host=gethostbyname(name)) == NULL) {
cout << "Could not look up host.\n";
exit(1);
}
ip = inet_ntoa(*((struct in_addr*) host->h_addr));
return (char *)ip.c_str();
}
int open_socket() {
int sockfd;
struct sockaddr_in dest_addr;
sockfd = socket(PF_INET, SOCK_STREAM, 0); // sock file descriptor
dest_addr.sin_family = AF_INET; // host byte order
dest_addr.sin_port = htons(DEST_PORT); // convert destination port to host byte order
dest_addr.sin_addr.s_addr = inet_addr(lookup_host(DEST_ADDR)); // convert IP address to long type
memset(&(dest_addr.sin_zero), '\0', 8); // clear out junk in rest of struct
// connect to host
if (connect(sockfd, (struct sockaddr *) &dest_addr, sizeof(struct sockaddr)) == -1) {
cout << "Could not connect to host (" << DEST_IP << ", port " << DEST_PORT << ").\n";
exit(1);
}
return sockfd;
}
int close_socket(int sockfd) {
return close(sockfd);
}
int send_packet(int sockfd, string data) {
int total=0;
int length = data.length();
int bytesleft = data.length();
int n;
while (total < length) {
n = send(sockfd, data.c_str()+total, bytesleft, 0);
if (n == -1) break;
total += n;
bytesleft -= n;
}
if (n == -1) {
cout << "Connection closed by host.\n";
exit(1);
}
return 0;
}
int recieve_packet(int sockfd, string &data) {
int bytes_recieved;
char buf[BUFFER_SIZE] = {'\0'};
data=buf;
while (1) {
bytes_recieved = recv(sockfd, buf, BUFFER_SIZE, 0);
if (bytes_recieved == 0) break;
if (bytes_recieved < 0) break;
data+=buf;
}
return 0;
}
int main() {
string request = "GET / HTTP/1.1\nHost: www.google.ca\n\n";
string response;
char buffer[BUFFER_SIZE] = {'\0'};
int bytes_recieved;
int sockfd = open_socket(); // create new sending socket
send_packet(sockfd, request);
recieve_packet(sockfd, response);
close_socket(sockfd);
cout << "Data recieved:\n";
cout << response << endl;
return 0;
}
Thanks. Last edited by joeprogrammer; 12-23-2006 at 07:15 PM. |
| joeprogrammer is offline | |
| | #4 |
| int x = *((int *) NULL); Join Date: Jul 2003 Location: Banks of the River Styx
Posts: 902
| Ah. First: HTTP headers are separated by return and a newline: "\r\n" in C. (or C++) (Although google seems to respond either way.) Second: HTTP 1.1 is by default a persistant ("Keep-Alive") connection. You could send another request after you're done, so google won't immediately close the connection. (Though I would have thought that if you waited long enough, google would eventually terminate it for you, and your original would, at that point, dump the results. (But it could be a several minute wait... nothing desired.)) You can either interpret the data as it comes, de-encoding it to see when you've got it all or you can specify "Connection: Close" as a header, which will (should) cause the responder to close the connection after the response. So, the only change you should need: Code: string request = "GET / HTTP/1.1\r\nHost: www.google.ca\r\nConnection: Close\r\n\r\n"; (If you're interested: HTTP's RFC, and relevant sections: General message format and Persistant connectsion)
__________________ long time; /* know C? */ Unprecedented performance: Nothing ever ran this slow before. Any sufficiently advanced bug is indistinguishable from a feature. Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31. The best way to accelerate an IBM is at 9.8 m/s/s. recursion (re - cur' - zhun) n. 1. (see recursion) Last edited by Cactus_Hugger; 12-23-2006 at 08:43 PM. |
| Cactus_Hugger is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| socket programming question, closing sockets... | ursula | Networking/Device Communication | 2 | 05-31-2009 05:17 PM |
| BOOKKEEPING PROGRAM, need help! | yabud | C Programming | 3 | 11-16-2006 11:17 PM |
| Getting a C++ program to type text into a Windows box, i.e. Internet Explorer address | Knowledge8069 | Windows Programming | 1 | 12-17-2002 01:06 AM |
| A socket program ? | Zahl | Windows Programming | 4 | 11-18-2002 02:48 PM |