Thread: Client timed-out once on connect(), can never connect() again

  1. #1
    Registered User
    Join Date
    Jun 2003
    Posts
    41

    Client timed-out once on connect(), can never connect() again

    Hi all,
    I have a fairly simple server running on system A with several clients running on systems B (inside LAN) and C (outside LAN). Now the client on system C was running great for hours making all sorts of calls to connect(), but then timed-out once, and ever since, can never connect() to the server on A ever again. The thing is, the client on B and even running a client on A continue to work fine, so A is responding. They can open a connection, talk, close it, wait X minutes, reopen, etc. However C just times-out on connect() forever.

    There are at least 3 firewalls between A and C, but none between A and B, so I could imagine it's a firewall issue. However the communications all take place over port 80 (http), and I can view C and get feedback, so I don't think it's a firewall issue. C is actually 152.20.76.100, so if you go there you get the outline of a webpage -- which makes me think it's not a firewall issue -- but C will never communicate with A.

    Some important notes:
    1) C WAS communicating great for a few hours, then hit a timeout on connect() ONCE and could never re-connect()
    2) restarting the process on C does not fix the problem.
    3) the client does other things and responds great, so it's not "frozen". For example, if you go to 152.20.76.100 you can submit queries to the graph and get feedback, so the client DOES respond to a user, it just can no longer reach A.

    Any ideas? Do you think it could be some very intelligent firewall that looks at HTTP headers (like the 'User-Agent' field) or something, and filters the packets on that level? I figured all firewalls would just shut down an entire port or not and if I can view the website it's not a firewall issue.

    I'm out of ideas. I've done tons of load testing internal to our LAN and no problems, so I don't know if C being outside our LAN is coincidence or not.

    Any ideas or suggestions would be greatly appreciated.
    Thanks.

    This is called every time C has data to send to A, then calls another method to actually write() to the socket, but it never gets that far:
    Code:
      
      
     Boolean RemoraServer::OpenCDMOSocket() 
     { 
     _ _struct sockaddr_in ServerAddress; // home base server address_ _ 
      
     _ _//open our socket 
     _ _if ((CDMOSocket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) 
     _ _{ 
     _ _ _ Log("While trying to open a socket to the CDMO, it failed"); 
     _ _ _ return (FALSE); 
     _ _} //end if 
     _ _  
     _ _//resolve our hostname into a host struct to get the IP address 
     _ _struct hostent *hostStructure = gethostbyname(repositoryURL); 
     _ _if (hostStructure == NULL) 
     _ _{ 
     _ _ _ Log("gethostbyname() failed with this exit code: " + itoa(h_errno)); 
     _ _ _ close(CDMOSocket); 
     _ _ _ CDMOSocket = -1; 
     _ _ _ return (FALSE); 
     _ _} //end if hostStructure failed 
      
     _ _//construct the server address structure  
     _ _memset(&ServerAddress, 0, sizeof(ServerAddress));_ _ _// zero out the structure to begin with 
     _ _ServerAddress.sin_family_ _ _ = AF_INET;_ _ _ _ _ _  
     _ _ServerAddress.sin_addr.s_addr = inet_addr(inet_ntoa( *(struct in_addr *) hostStructure->h_addr_list[0] ) ); //just try the first address  
     _ _ServerAddress.sin_port_ _ _ _ = htons(repositoryPort);  
     _ _int connect_value = 0; 
     _ _ 
     _ _// now establish the actual connection to home base 
     _ _if ((connect_value = connect(CDMOSocket, (struct sockaddr *) &ServerAddress, sizeof(ServerAddress))) < 0) 
     _ _{ 
     _ _ _ Log("Connect failed - (" + itoa(errno) + ") " + (JString)strerror(errno)); 
     _ _ _ Log((JString)hostStructure->h_addr_list[0]); 
     _ _ _ close(CDMOSocket); 
     _ _ _ CDMOSocket = -1; 
     _ _ _ return (FALSE); 
     _ _} //end if 
      
     _ _return (CDMOSocket != -1 ? TRUE : FALSE); 
      
     } //end RemoraServer::OpenCDMOSocket()


    These are the msgs that are logged, which show the exact same error (timeout) over and over and over:
    Fri Oct 17 12:14:32 2003
    SERVER Our host successfully added the requested data
    Fri Oct 17 12:14:32 2003
    SERVER We received this command:PUT NOCRC CR10-150 1066380300 11.24800 97.84900 23.76600 1019.09998 1.65390 312.00000
    Fri Oct 17 12:14:32 2003
    SERVER Our host successfully added the requested data
    Fri Oct 17 12:14:32 2003
    SERVER CDMOSocket was -1 so we're going to try and open it
    Fri Oct 17 12:17:41 2003
    SERVER Connect failed - (110) Connection timed out
    Fri Oct 17 12:17:41 2003
    SERVER ¿®
    Fri Oct 17 12:17:41 2003
    SERVER ERROR: Could not open CDMO socket
    Fri Oct 17 12:17:41 2003
    SERVER We were unable to write our data string to the CDMO
    Fri Oct 17 12:17:41 2003
    SERVER We were not able to transmit this data to the CDMO
    Fri Oct 17 12:17:41 2003
    SERVER We could not fully process the command 'PUT NOCRC CR10-150 1066380300 11.24800 97.84900 23.76600 1019.09998 1.65390 312.00000 '
    Fri Oct 17 12:17:41 2003
    SERVER We received this command:PUT NOCRC CR10-150 1066359600 12.77800 95.02100 25.98500 1020.00000 1.92210 309.00000
    Fri Oct 17 12:17:41 2003
    SERVER Our host successfully added the requested data
    Fri Oct 17 12:17:41 2003
    SERVER CDMOSocket was -1 so we're going to try and open it

    etc....................

  2. #2
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    I can connect to cboard.cprogramming.com, however, cboard can not connect to me. Or, just because A can connect to C does not mean C can connect to A.

    Many ISPs block incoming port 80 but it would be a large coincidence if they turned this on just when you were developing your program.

    What changed? Did you change ports, change server computers, administrate a router or firewall? Is port forwarding set up correctly? Did the internal ip address of the server change (this could break port forwarding)?

    My guess would be the last one. Are you using port forwarding?

    Can computer C ping computer A?
    Can you try another port?

    From C's perspective A is either non-existant or behind a stealth firewall.

  3. #3
    Registered User
    Join Date
    Jun 2003
    Posts
    41
    Well see that's the thing, development only took place on systems A and B. This is the first time I loaded anything onto C, because C is a customer beta testing site. Unfortunately I have no idea how their network is setup, however our agreement was that port 80 had to remain open to allow things to work properly. So I can't try another port. I did not change anything on our side, however. The clients on B continue to work fine.

    I don't know if it's plausible their firewall is blocking only my traffic through port 80, or my code isn't written well. I think it's much more likely my code is the problem, especially since I can http://152.20.76.100 and see the webpage being served by system C, and I doubt firewalls are smart enough to realize "normal" http traffic and my data, which is sent in HTTP GET msgs.

    Someone recommend I set the SO_KEEPALIVE option on the socket, but I heard that doesn't work with Linux, but I'm going to look into that, I'm not sure if that would explain my problem.

    I cannot ping ANYTHING from C, however ping doesn't use port 80, right, so that might not be a problem. I'm only "gauranteed" port 80 is open.

    I CAN telnet thru port 80 and get to A, if I "telnet www.nerrenvirons.org 80" (that's A), I get a
    Trying 12.170.16.134 <---correct
    Connected to www.nerrenvirons.org <-correct
    Escape character is '^]'.

    so that means C can get to A thru port 80, right?

    Thanks for the help anonytmouse.

  4. #4
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    It works fine for me. Here was the code I used:
    Code:
    #include <winsock.h>
    #include <stdio.h>
    
    #define Log printf
    
    BOOL OpenCDMOSocket(void) { 
    
    	struct sockaddr_in ServerAddress;
    	WSADATA wsaData; 
    	SOCKET CDMOSocket;
    
    	WSAStartup(MAKEWORD(1,1), &wsaData); 
      
    	if ((CDMOSocket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) == INVALID_SOCKET) { 
    		Log("While trying to open a socket to the CDMO, it failed"); 
    		return (FALSE); 
    	}
    
    	struct hostent * hostStructure = gethostbyname("www.nerrenvirons.org"); 
    	if (hostStructure == NULL) { 
    		Log("gethostbyname() failed"); 
    		return (FALSE); 
    	}
      
    	memset(&ServerAddress, 0, sizeof(ServerAddress));
    	ServerAddress.sin_family      = AF_INET;
    	ServerAddress.sin_addr.s_addr = inet_addr(inet_ntoa( *(struct in_addr *) hostStructure->h_addr_list[0] ) );
    	ServerAddress.sin_port        = htons(80);  
    	int connect_value = 0;
    
    	printf("Connecting...\n");
    	printf("Address is %s\n", inet_ntoa( *((struct in_addr *) &ServerAddress.sin_addr.s_addr)));
    
    	if ((connect_value = connect(CDMOSocket, (struct sockaddr *) &ServerAddress, sizeof(ServerAddress))) < 0) { 
    		Log("Connect failed"); 
    		return (FALSE);
    	}
    
    	printf("Connected\n");
    	send(CDMOSocket, "GET / HTTP/1.1\r\n\r\n", lstrlen("GET / HTTP/1.1\r\n\r\n"), 0);
    
    	BYTE myBuf[4096];
    	UINT cbReceived;
    
    	cbReceived = recv(CDMOSocket, myBuf, 4096, 0);
    	myBuf[cbReceived] = '\0';
    
    	printf("%s", myBuf);
    
    	return TRUE;
    }
    
    int main() {
    	OpenCDMOSocket();
    	getchar();
    	return 0;
    }
    And here is the result:
    Code:
    Connecting...
    Address is 12.170.16.134
    Connected
    HTTP/1.0 200 OK
    Connection: close
    Date: Sun, 19 Oct 2003 18:19:41 GMT
    Server: Apache/2.0.40 (Red Hat Linux)
    Last-Modified: Thu, 16 Oct 2003 22:24:10 GMT
    ETag: "19a7a9-234-56b43680"
    Accept-Ranges: bytes
    Content-Length: 564
    Content-Type: text/html; charset=ISO-8859-1
    
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR
    /html4/frameset.dtd">
    <html>
    <head>
    <title>NERR Telemetry</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    </head>
    <frameset cols="95,*"> <!-- 1017 -->
       <frame src="./pages/leftframe.html" name="leftFrame" frameborder="0" scrollin
    g="no" noresize marginwidth="0">
       <frame src="./pages/data.html" name="mainFrame" frameborder="0" scrolling="ye
    s">
       <noframes>
       Unfortunately your browser does not support frames
       </noframes>
    </frameset>
    </html>
    Are you sure repositryURL and repositryPort are valid?

  5. #5
    Registered User
    Join Date
    Jun 2003
    Posts
    41
    Wow great work, thanks for the confirmation. That's the thing, the connect() will work great for about 4 hours, then for 6-8 hours connect() times-out forever. Then the cycle repeats. Even rebooting doesn't solve it. It might be firewall related, I'm discussing it with the various sysadmins that oversee the 3 firewalls in place. I've been code reviewing and monitoring for memory leaks, like maybe there's a leak that sucks-up all the file descriptors allowed, or no more memory or something, and that's why connect() times-out, but I haven't seen any evidence of that.

    And the repository stuff should be fine, I set them when the object is constructed and never change them:


    Code:
    RemoraServer::RemoraServer()
       : CommonApplication("/nerr/remora/.RemoraServer.debug", "SERVER"),
         dataListenSocket(-1),
         dataServerSocket(-1),
         defaultDataPort(11428),
         myProtocol("tcp"),
         numberOfHosts(0),
         repositoryURI("/pages/ProcessRemoteData.php"),
         repositoryURL("www.nerrenvirons.org"),
         repositoryPort(80),
         myConfigFile("/nerr/remora/host.config"),
         CDMODataString(""),
         CDMOCmdsCollectedSoFar(0),
         CDMOMaxCmdsPerDataString(20), //start out small for experimentation
         amITheCDMO(FALSE),
         CDMOSocket(-1),
         numberOfFailedAttempts(0)
    {
       //setup our signal handler
       if (signal(SIGPIPE, &SignalHandler) == SIG_ERR)
       {
          //do nothing here, just don't die for now
       }
    
    } //RemoraServer::RemoraServer()

  6. #6
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Are you able to connect via telnet from C to A during the timeout period? That would be truly weird. Are you rebooting both client and server?

  7. #7
    Registered User
    Join Date
    Jun 2003
    Posts
    41
    Sorry for the late feedback. After much packet sniffing I found our domain name (www.nerrenvirons.org) was having its internal LAN IP address published during the connect() failures, and its public internet address published during the connect() successes. Now I have no clue how this is even possible, but by hardcoding the IP address rather than the domain name, it works great since no name resolution needs to take place. Use the domain name and it breaks periodically. Use the IP address and it works for hundreds of hours. By updating our gateway (it was RH 7.2, very outdated), this periodic failure stopped, and our clients have been running fine ever since. Very weird, I don't know why the IP address would sometimes be published as inernal, and others internet but since upgrading our gateway the problem vanished so I'm putting it down as a bug in an RPM or several bugs that would collide from time to time. Thanks for all the time and feedback you gave me!!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Can't get to connect clients to server
    By pyngz in forum C# Programming
    Replies: 2
    Last Post: 03-11-2009, 04:46 AM
  2. Client: Failed to connect
    By Galvatron in forum Networking/Device Communication
    Replies: 7
    Last Post: 05-12-2008, 03:40 PM
  3. Socket Programming Problem!!!!
    By bobthebullet990 in forum Networking/Device Communication
    Replies: 2
    Last Post: 02-21-2008, 07:36 PM
  4. Getting info on a client when they connect to the server
    By Finchie_88 in forum Networking/Device Communication
    Replies: 4
    Last Post: 06-01-2005, 07:12 AM
  5. Problem with my client and connect()
    By Thantos in forum Networking/Device Communication
    Replies: 2
    Last Post: 09-02-2003, 05:38 PM