C Board  

Go Back   C Board > General Programming Boards > Networking/Device Communication

Reply
 
LinkBack Thread Tools Display Modes
Old 12-27-2006, 10:48 AM   #1
Registered User
 
Join Date: Jan 2005
Location: Estonia
Posts: 131
Why do I get partial web-pages with recv?

OS: Linux Ubuntu 6.10


I am trying to receive the full html page of www.google.com, but I only get a partial page. Need help with internet socket program describes a similiar problem and the "solution" was to use curl.

I don't want to use curl, I want to get the page by the recv() function.

I am sending this request to google:

Code:
 GET / HTTP/1.1 <crlf>
Host: www.google.com <crlf>
Connection: close <crlf>
<crlf>

The <crlf>-s are substituted with \r\n ofcourse.

Here's the preparation code:
Code:
int sock = socket(AF_INET, SOCK_STREAM, 0);
int status = 1;
ioctl(sock, FIONBIO, &status); //put the socket in non-blocking mode.

sockaddr_in socket_address;
//I am doing the evaluation here(port, host and addr family).
//Connect the socket...

Here's the main loop:
Code:
char buffer[1000];
string whole_page;

while (1)
{
    int bytes = recv(sock, buffer, 1000);
    if (bytes == -1)
    {
        if (errno == EAGAIN) continue; //would block
        else return;
    }
    if (bytes == 0)
    {   //Google disconnected me?
        return;
    }
    whole_page += buffer;
}
So why does this code only get a partial source code?


does "bytes == 0" mean that google.com terminated the connection?
If not, then how can I know when the connection gets terminated.
hardi is offline   Reply With Quote
Old 12-27-2006, 11:09 AM   #2
Registered User
 
Join Date: Jan 2005
Location: Estonia
Posts: 131
A quote from http://www.madwizard.org/view.php?pa...pter6&lang=cpp
Quote:
Recv too will block if no data is available immediately and return if some has arrived. The return value of recv is either 0, SOCKET_ERROR or the number of bytes read. SOCKET_ERROR of course indicates a socket error, 0 indicates closure of the connection.
It's a tutorial on winsock. Does that == 0 rule apply to linux sockets too?

A quote from:
http://www.hmug.org/man/2/recv.php

Quote:
These calls return the number of bytes received, or -1 if an error
occurred.
It says nothing about connection closure. Can I assume that bytes == 0 means that the connection is terminated?
hardi is offline   Reply With Quote
Old 12-27-2006, 11:40 AM   #3
and the hat of Jobseeking
 
Salem's Avatar
 
Join Date: Aug 2001
Location: The edge of the known universe
Posts: 21,699
> Can I assume that bytes == 0 means that the connection is terminated?
I think getting a return of 0 means you need to go check the value of errno.

If you're using a non-blocking socket, then you should get EAGAIN indicating that the connection is still alive, but there is no data at the moment.

A zero return on a blocking socket is end of connection.

Also, look at select() to help you determine if there is any data to be read, before you read it.
__________________
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.

Salem is offline   Reply With Quote
Old 12-27-2006, 07:56 PM   #4
int x = *((int *) NULL);
 
Cactus_Hugger's Avatar
 
Join Date: Jul 2003
Location: Banks of the River Styx
Posts: 902
I believe that if recv() returns 0, the connection is closed:
Quote:
Originally Posted by man recv(2)
If no messages are available at the socket, the receive calls wait for a message to arrive, unless the socket is nonblocking (see fcntl(2)) in which case the value -1 is returned and the external variable errno set to EAGAIN.
There are bigger errors, however.
Code:
int bytes = recv(sock, buffer, 1000);
...
whole_page += buffer;
First, recv() takes four arguments, not three. The last one is flags for recv(), usually 0.

Second, you cannot just append your buffer to a C++ string like that. std::string's += operator appends a C string, and said string must be nul terminated. recv() does not append any null to the buffer, so you must do so yourself before passing the buffer to anything that expects C strings. (Which also means that you should pass 1 byte less than the total size of your buffer to recv(), to save room for the null you will append.) Something like:
Code:
ret = recv(my_socket, my_buffer, my_buffers_size - 1, 0);
// Error checking.
my_buffer[ret] = 0;
my_cppstring += my_buffer;
And be sure you return your string somehow when you're done... Google will disconnect you after sending the data.
__________________
long time; /* know C? */
Unprecedented performance: Nothing ever ran this slow before.
Any sufficiently advanced bug is indistinguishable from a feature.
Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
The best way to accelerate an IBM is at 9.8 m/s/s.
recursion (re - cur' - zhun) n. 1. (see recursion)

Last edited by Cactus_Hugger; 12-27-2006 at 08:03 PM.
Cactus_Hugger is offline   Reply With Quote
Old 12-28-2006, 05:25 AM   #5
Registered User
 
Join Date: Jan 2005
Location: Estonia
Posts: 131
Quote:
Originally Posted by Cactus_Hugger
I believe that if recv() returns 0, the connection is closed:


There are bigger errors, however.
Code:
int bytes = recv(sock, buffer, 1000);
...
whole_page += buffer;
First, recv() takes four arguments, not three. The last one is flags for recv(), usually 0.

Second, you cannot just append your buffer to a C++ string like that. std::string's += operator appends a C string, and said string must be nul terminated. recv() does not append any null to the buffer, so you must do so yourself before passing the buffer to anything that expects C strings. (Which also means that you should pass 1 byte less than the total size of your buffer to recv(), to save room for the null you will append.) Something like:
Code:
ret = recv(my_socket, my_buffer, my_buffers_size - 1, 0);
// Error checking.
my_buffer[ret] = 0;
my_cppstring += my_buffer;
And be sure you return your string somehow when you're done... Google will disconnect you after sending the data.
I wrote this code here in the forum without compiling, thus the 4th argument was accidentally left out. np with that.

But I didn't know the the buffer does not contain a '\0' character in the end of it - thanks for pointing that out
hardi is offline   Reply With Quote
Old 12-28-2006, 05:30 AM   #6
Registered User
 
Join Date: Mar 2005
Location: Juneda
Posts: 229
On Winsocks, when the nº of bytes received are 0 or 'WSAECONNRESET', means that the transfer has ended; I suppose that will be similar on Linux sockets. Also theres something to get some last unexpected bytes while closing the connection (I don't know if is your problem, but maybe it will help) http://tangentsoft.net/wskfaq/exampl...cs/ws-util.cpp, take a look at the function 'ShutdownConnection(socket)'.

Niara
Niara is offline   Reply With Quote
Old 12-28-2006, 06:03 AM   #7
Registered User
 
Join Date: Jan 2005
Location: Estonia
Posts: 131
Quote:
Originally Posted by Niara
On Winsocks, when the nº of bytes received are 0 or 'WSAECONNRESET', means that the transfer has ended; I suppose that will be similar on Linux sockets. Also theres something to get some last unexpected bytes while closing the connection (I don't know if is your problem, but maybe it will help) http://tangentsoft.net/wskfaq/exampl...cs/ws-util.cpp, take a look at the function 'ShutdownConnection(socket)'.

Niara
I don't think that will help, as I specified in the GET request "Connection: Close". Thus, when all the data has been sent, google should terminate the connection, but that doesn't happen.
hardi is offline   Reply With Quote
Old 12-28-2006, 07:44 AM   #8
Registered User
 
Join Date: Jan 2005
Location: Estonia
Posts: 131
Ok it's all right now. I got the loop working. Thanks everyone

But now I have another problem.

I was able to receive the full source code of www.google.ee, but there are some weird lines that I think should not be there.
I put the source to http://haxxx.hyena.pri.ee/crap.txt

As you can see, the 9th and the last line contain respectively "aae" and "0".
But when I open www.google.ee in my browser, I don't get such lines.


Here's the loop:
Code:
char source[2000];
string full_source;

ioctl(sock_, FIONBIO, &status);
    while (1)
    {
        usleep(100000); //sleep 100 milliseconds
        int ret = recv(sock_, source, 2000 - 1, 0);
        if (ret == -1)
        {
            if (errno == EAGAIN)
            {   //would block
                cout << "------Would block------" << endl;
                continue;
            }
            else
            {
                cout << "------A fatal error occurred-----" << endl;
                cout << "Errno = " << errno << " - " << strerror(errno) << endl;
                break;
            }
        }
        if (ret == 0)
        {   //the connection was shut down
            cout << "------Connection was shut down------" << endl;
            break;
        }
        //We are here if ret > 0, thus we got sama data!
        source[ret] = '\0'; //Add a '\0' to the end of the received data.
        cout << "------I got some data------:" << endl;
        cout << source << endl;
        
        
        full_source += source;
    }
Btw: crap.txt contains the data from full_source.txt not frome the console output.

Last edited by hardi; 12-28-2006 at 07:47 AM.
hardi is offline   Reply With Quote
Old 12-28-2006, 08:02 AM   #9
Cat without Hat
 
CornedBee's Avatar
 
Join Date: Apr 2003
Posts: 8,492
That would be the chunked transfer encoding. Read the HTTP spec for more information.
__________________
All the buzzt!
CornedBee

"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law
CornedBee is offline   Reply With Quote
Old 12-28-2006, 10:26 AM   #10
int x = *((int *) NULL);
 
Cactus_Hugger's Avatar
 
Join Date: Jul 2003
Location: Banks of the River Styx
Posts: 902
Chucked transfer encodings gave me a fun time when I first encountered them. (Except I was working with JPEGs, so they completely corrupted the result until decoded.)

To elaborate, see this section (and perhaps the one above it) of the HTTP Protocol spec.
__________________
long time; /* know C? */
Unprecedented performance: Nothing ever ran this slow before.
Any sufficiently advanced bug is indistinguishable from a feature.
Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
The best way to accelerate an IBM is at 9.8 m/s/s.
recursion (re - cur' - zhun) n. 1. (see recursion)
Cactus_Hugger is offline   Reply With Quote
Old 12-28-2006, 01:31 PM   #11
Registered User
 
Join Date: Jan 2005
Location: Estonia
Posts: 131
Isn't there a good tutorial on this?
Those specifications are so complicated and there is a tremendeous lack of (good) examples.
hardi is offline   Reply With Quote
Old 12-28-2006, 02:50 PM   #12
Cat without Hat
 
CornedBee's Avatar
 
Join Date: Apr 2003
Posts: 8,492
I don't think there is. That's why there are libraries such as cURL.

To put it bluntly, either you understand specifications, or you have no business trying to implement them.
__________________
All the buzzt!
CornedBee

"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law
CornedBee is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Partial web page downloading god_of_war C++ Programming 12 08-14-2006 12:19 PM
embedding web pages Devil Panther Windows Programming 9 01-14-2005 09:37 AM
Layout of web pages whilst browsing. Fountain Tech Board 9 11-19-2003 09:24 PM
creating a user login system for web pages Nutshell A Brief History of Cprogramming.com 1 07-04-2002 11:02 PM


All times are GMT -6. The time now is 01:32 AM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22