![]() |
| | #1 |
| Registered User Join Date: Jan 2005 Location: Estonia
Posts: 131
| Why do I get partial web-pages with recv? I am trying to receive the full html page of www.google.com, but I only get a partial page. Need help with internet socket program describes a similiar problem and the "solution" was to use curl. I don't want to use curl, I want to get the page by the recv() function. I am sending this request to google: Code: GET / HTTP/1.1 <crlf> Host: www.google.com <crlf> Connection: close <crlf> <crlf> The <crlf>-s are substituted with \r\n ofcourse. Here's the preparation code: Code: int sock = socket(AF_INET, SOCK_STREAM, 0); int status = 1; ioctl(sock, FIONBIO, &status); //put the socket in non-blocking mode. sockaddr_in socket_address; //I am doing the evaluation here(port, host and addr family). Here's the main loop: Code: char buffer[1000];
string whole_page;
while (1)
{
int bytes = recv(sock, buffer, 1000);
if (bytes == -1)
{
if (errno == EAGAIN) continue; //would block
else return;
}
if (bytes == 0)
{ //Google disconnected me?
return;
}
whole_page += buffer;
}
does "bytes == 0" mean that google.com terminated the connection? If not, then how can I know when the connection gets terminated. |
| hardi is offline | |
| | #2 | ||
| Registered User Join Date: Jan 2005 Location: Estonia
Posts: 131
| A quote from http://www.madwizard.org/view.php?pa...pter6&lang=cpp Quote:
A quote from: http://www.hmug.org/man/2/recv.php Quote:
| ||
| hardi is offline | |
| | #3 |
| and the hat of Jobseeking Join Date: Aug 2001 Location: The edge of the known universe
Posts: 21,699
| > Can I assume that bytes == 0 means that the connection is terminated? I think getting a return of 0 means you need to go check the value of errno. If you're using a non-blocking socket, then you should get EAGAIN indicating that the connection is still alive, but there is no data at the moment. A zero return on a blocking socket is end of connection. Also, look at select() to help you determine if there is any data to be read, before you read it. |
| Salem is offline | |
| | #4 | |
| int x = *((int *) NULL); Join Date: Jul 2003 Location: Banks of the River Styx
Posts: 902
| I believe that if recv() returns 0, the connection is closed: Quote:
Code: int bytes = recv(sock, buffer, 1000); ... whole_page += buffer; Second, you cannot just append your buffer to a C++ string like that. std::string's += operator appends a C string, and said string must be nul terminated. recv() does not append any null to the buffer, so you must do so yourself before passing the buffer to anything that expects C strings. (Which also means that you should pass 1 byte less than the total size of your buffer to recv(), to save room for the null you will append.) Something like: Code: ret = recv(my_socket, my_buffer, my_buffers_size - 1, 0); // Error checking. my_buffer[ret] = 0; my_cppstring += my_buffer;
__________________ long time; /* know C? */ Unprecedented performance: Nothing ever ran this slow before. Any sufficiently advanced bug is indistinguishable from a feature. Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31. The best way to accelerate an IBM is at 9.8 m/s/s. recursion (re - cur' - zhun) n. 1. (see recursion) Last edited by Cactus_Hugger; 12-27-2006 at 08:03 PM. | |
| Cactus_Hugger is offline | |
| | #5 | |
| Registered User Join Date: Jan 2005 Location: Estonia
Posts: 131
| Quote:
But I didn't know the the buffer does not contain a '\0' character in the end of it - thanks for pointing that out | |
| hardi is offline | |
| | #6 |
| Registered User Join Date: Mar 2005 Location: Juneda
Posts: 229
| On Winsocks, when the nº of bytes received are 0 or 'WSAECONNRESET', means that the transfer has ended; I suppose that will be similar on Linux sockets. Also theres something to get some last unexpected bytes while closing the connection (I don't know if is your problem, but maybe it will help) http://tangentsoft.net/wskfaq/exampl...cs/ws-util.cpp, take a look at the function 'ShutdownConnection(socket)'. Niara |
| Niara is offline | |
| | #7 | |
| Registered User Join Date: Jan 2005 Location: Estonia
Posts: 131
| Quote:
| |
| hardi is offline | |
| | #8 |
| Registered User Join Date: Jan 2005 Location: Estonia
Posts: 131
| Ok it's all right now. I got the loop working. Thanks everyone ![]() But now I have another problem. I was able to receive the full source code of www.google.ee, but there are some weird lines that I think should not be there. I put the source to http://haxxx.hyena.pri.ee/crap.txt As you can see, the 9th and the last line contain respectively "aae" and "0". But when I open www.google.ee in my browser, I don't get such lines. Here's the loop: Code: char source[2000];
string full_source;
ioctl(sock_, FIONBIO, &status);
while (1)
{
usleep(100000); //sleep 100 milliseconds
int ret = recv(sock_, source, 2000 - 1, 0);
if (ret == -1)
{
if (errno == EAGAIN)
{ //would block
cout << "------Would block------" << endl;
continue;
}
else
{
cout << "------A fatal error occurred-----" << endl;
cout << "Errno = " << errno << " - " << strerror(errno) << endl;
break;
}
}
if (ret == 0)
{ //the connection was shut down
cout << "------Connection was shut down------" << endl;
break;
}
//We are here if ret > 0, thus we got sama data!
source[ret] = '\0'; //Add a '\0' to the end of the received data.
cout << "------I got some data------:" << endl;
cout << source << endl;
full_source += source;
}
Last edited by hardi; 12-28-2006 at 07:47 AM. |
| hardi is offline | |
| | #9 |
| Cat without Hat Join Date: Apr 2003
Posts: 8,492
| That would be the chunked transfer encoding. Read the HTTP spec for more information.
__________________ All the buzzt! CornedBee"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code." - Flon's Law |
| CornedBee is offline | |
| | #10 |
| int x = *((int *) NULL); Join Date: Jul 2003 Location: Banks of the River Styx
Posts: 902
| Chucked transfer encodings gave me a fun time when I first encountered them. (Except I was working with JPEGs, so they completely corrupted the result until decoded.) To elaborate, see this section (and perhaps the one above it) of the HTTP Protocol spec.
__________________ long time; /* know C? */ Unprecedented performance: Nothing ever ran this slow before. Any sufficiently advanced bug is indistinguishable from a feature. Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31. The best way to accelerate an IBM is at 9.8 m/s/s. recursion (re - cur' - zhun) n. 1. (see recursion) |
| Cactus_Hugger is offline | |
| | #11 |
| Registered User Join Date: Jan 2005 Location: Estonia
Posts: 131
| Isn't there a good tutorial on this? Those specifications are so complicated and there is a tremendeous lack of (good) examples. |
| hardi is offline | |
| | #12 |
| Cat without Hat Join Date: Apr 2003
Posts: 8,492
| I don't think there is. That's why there are libraries such as cURL. To put it bluntly, either you understand specifications, or you have no business trying to implement them.
__________________ All the buzzt! CornedBee"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code." - Flon's Law |
| CornedBee is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Partial web page downloading | god_of_war | C++ Programming | 12 | 08-14-2006 12:19 PM |
| embedding web pages | Devil Panther | Windows Programming | 9 | 01-14-2005 09:37 AM |
| Layout of web pages whilst browsing. | Fountain | Tech Board | 9 | 11-19-2003 09:24 PM |
| creating a user login system for web pages | Nutshell | A Brief History of Cprogramming.com | 1 | 07-04-2002 11:02 PM |