TCP server not detecting broken connection

**berabin** · 07-04-2011

I am attempting to write a server application in C on a linux machine which listens for TCP connections and transfers data. I am trying to detect on the server side when the connection is broken. The closest thing that I got to work was looking at the return value from sending data. For example the server’s job is to mainly read data from the socket but also sends data back to the client periodically to test if the connection is still up. I look at the return value from send() to determine if the connection is broken e.g.

Code:

int ret = send(session->clientSocket, &data[sentCnt], count - sentCnt, MSG_NOSIGNAL);

I found that this does not immediately return an error when the connection is broken. The reason for this is because even though the connection is broken send() is still successful because it is able to put it on the network buffer. To fix the issue I did the following things;

Code:

 //set send timeout
    struct timeval timeout;
    timeout.tv_sec = 4;
    timeout.tv_usec = 0;

    //apply send timeout socket options
    setsockopt(tcp->socket, SOL_SOCKET, SO_SNDTIMEO, &timeout, sizeof(struct timeval));

    //set output buffer size so that send blocks
    int buffersize = 2;
    setsockopt(tcp->socket, SOL_SOCKET, SO_SNDBUF, &buffersize, 4);

Setting the timeout and the buffer should ensure that the socket will timeout if the data sent is more than 2 bytes and the connection is broken.

I have tested this and it worked i.e. when i pull the plug from the modem the server detects that the conneciton is broken fairly quickly.
The server runs as a virtual machine somewhere in the US and the client which im using to test is in Sydney Australia. I tested the client on my Telstra cable connection and it worked. I also tested on my Optus 3G wireless broadband connection and it also worked. The problem is however when I tested on my Telstra 3G wireless broadband and it DID NOT work. The server was happily thinking that it was sending data to the client when the client was well and truly disconnected from the internet. I had swaped modems and simcards around isolated the problem down to the network provider i.e. works on Telstra cable and Optus 3G but not on Telstra 3G.

How can the server think that it’s successfully sent data to the client which is not connected to the internet, I thought TCP was in end to end protocol.

Does anyone have any idea why this occurs or how else i can go about doing this?

**MK27** · 07-04-2011

Originally Posted by berabin

I found that this does not immediately return an error when the connection is broken.

I have found the very same thing. It is very easy for the system to fail to detect a disconnection. My only real problem with this is it can lead to server crashes. While I have not tried the solution you used, it seems to me these can be prevented by using signal() to ignore SIGPIPE and SIGBUS, and always catching errors returned by recv/read/send/write. Certain errors, such as EAGAIN on a non-blocking socket, are acceptable within reason, so you can use something like:

Code:

        int errs = 0;
        [ ... ]
        while ((t < sz) && (r = read(sock, &data[t], sz-t))) {
                if (r == -1) {
                        if (errno == EAGAIN) {
                                if (++errs == 10) {
                                        error("takecall() EAGAIN repeated 10 times on read()", NULL);
                                        sendMsg(sock, "ERROR");
                                } else continue;
                        } else error("takecall()->read() failed: ", strerror(errno), NULL);
                        close(sock);
                        return;
                }
                t += r;
        }

Notice I also make an attempt to signal the client of an error. Altho most likely that send will fail, there is a chance that the connection was actually still good and the client can know that it was broken purposely due to a server issue, and should attempt to re-connect and repeat the transaction. I then have a sort of parallel loop in the client which will automatically retry 5-10 times when it receives "ERROR", then give up and inform the user. These can get played out occasionally.

Originally Posted by berabin

How can the server think that it’s successfully sent data to the client which is not connected to the internet,

Don't take my word for it, but IMO you cannot prevent this without implementing some basic protocol of your own. In other words, if you want to be sure the client received the data, have the client send a reply -- never simply assume the transfer is complete. Of course, if the client has not received the data because it disconnected, there is nothing the server can do but note the event anyway. If the server does not receive confirmation, that may mean the client did not receive data -- or it may mean its reply never occurred/was cut. However, if the server receives confirmation, particularly confirmation of the number of bytes read, then you can be sure the transfer was a success.

One technique I've used before is to make the first four bytes of every transmission an integer indicating the remaining length of the message (like "Content-length" works with http). This is useful in various ways, such as allocating a buffer, and can help confirm that the entire message was received. Another easy useful element to add to this simple protocol is to use the next byte as a bitfield or code about the nature of the message. Eg, you could flag this to indicate it is a confirmation, then the message is simply another 4-byte int indicating bytes received.

**berabin** · 07-04-2011

Hi MK27,
Thanks for you input, im not sure what you mean exactly when you say

it seems to me these can be prevented by using signal() to ignore SIGPIPE and SIGBUS, and always catching errors returned by recv/read/send/write.

but as you can see from send() im using NO_SIGNAL because i found that if it tries to send when the connection is broken the application exits (probably because i don't handle the signals properly).

I have also found that using read() and looking for -1 and errno is probably not the best because i have set a read timeout and it always returns -1 with errno EAGAIN when there is no data to read i.e. the client has not sent anything. it is possible in my application for the client to not send anything for some time and i don't want to break the connection in this case.

I think its strange that i need to implement my own protocol because TCP does not do what its supposed to do. I know that in TCP for any data that is sent there should be an ACK which should come from the other end. If it does not receive it then it tries again until it times out. I expected that this is what was occurring with the ISP's that were working but i dont know whats happening with the one that does not. I think ill do some local testing and look at whats happening in wireshark.

**Yarin** · 07-05-2011

Originally Posted by MK27

it seems to me these can be prevented by using signal() to ignore SIGPIPE and SIGBUS

It's good to catch SIGPIPE, but SIGBUS should never be thrown during networking. If your getting SIGBUS then something is wrong with your code.

Originally Posted by MK27

Code:

        int errs = 0;
        [ ... ]
        while ((t < sz) && (r = read(sock, &data[t], sz-t))) {
                if (r == -1) {
                        if (errno == EAGAIN) {
                                if (++errs == 10) {
                                        error("takecall() EAGAIN repeated 10 times on read()", NULL);
                                        sendMsg(sock, "ERROR");
                                } else continue;
                        } else error("takecall()->read() failed: ", strerror(errno), NULL);
                        close(sock);
                        return;
                }
                t += r;
        }

From the looks of this, if the socket reports no data (EAGAIN), you just call read() again 10 consecutive times. This is wrong: What if the network is slow, and can't get more data to the recv buffer within the time it takes to call read() 10 more times? Your code would incorrectly assume the connection is broken. You should poll() or select() for the data.

**MK27** · 07-05-2011

Originally Posted by Yarin

It's good to catch SIGPIPE, but SIGBUS should never be thrown during networking. If your getting SIGBUS then something is wrong with your code.

SIGBUS might only occur on local sockets -- SIGPIPE is definitely the more frequent and significant one. I guess I've taken a better safe than sorry stance there. I guess I should pay more attention to this.

From the looks of this, if the socket reports no data (EAGAIN), you just call read() again 10 consecutive times. This is wrong: What if the network is slow, and can't get more data to the recv buffer within the time it takes to call read() 10 more times? Your code would incorrectly assume the connection is broken. You should poll() or select() for the data.

Yes, it does make that (potentially incorrrect) assumption -- I alluded to that (sending the client "ERROR" as an instruction to repeat just in case, even though I am assuming a bad connection). Nb, this function is called via select(), sorry for not making that explicit -- if select says there is data to read and then there is none, this is potentially a client that connects() then leaves, but no ECONNREFUSED gets thrown. One of the things I have to deal with is potentially malicious clients, so I'd rather err on the side of conservatism.

If you do not close and just return the handle to the select queue, it may now do this perpetually. In the case of a light duty server (mine are, lol), with no other connections pending, that will cause the server to hog the processor: select says read, read returns EAGAIN, we hand the descriptor back to select, and so on an on and on.

I prefer non-persistent connections, so telling the client to keep trying until it succeeds (as opposed to maintaining long term queues) fits in with that. But I'll mull this over for next time, Yarin, cheers

Thread: TCP server not detecting broken connection

Thread Tools

Search Thread

Display

TCP server not detecting broken connection

Similar Threads

Connection between proxy server and the server

SQL Server 2005 DB connection with ado, and fill 2 dropdownlists

detecting connection

Detecting (internet) connection status

multiple connection server