Query multiple servers quickly

**Stack Overflow** · 05-03-2008

Hello,

I am trying to determine the best possible solution to accomplish a simple, yet complex task.

I've been working on a new project for quite some time and have written a custom framework from the ground up. It's more or less a (Client --> Server --> Server <-- Client) system.

So far I have the basics done. Here's the process:

I expect to have one client connect to my listening server.
That client will periodically send me a string of IP Addresses (10 or more at a time).
When I receive the data, I split it up accordingly and query the IPs just received.

All of that I can handle and have completed the code for up to 95%. The string of IP Addresses I expect to receive are of other servers in which I will attempt to query and gather data from.

However, the problem is I may need to query up to 100 or 1000 servers and I don't want to have to wait for one to finish, in case it's unresponsive and gets delayed the full timeout length before moving on.

With that said, I want to query as many servers as fast as possible. And I don't care which order they come back in. As soon as I receive the data I just need to save it to a file (no further communication to the original client is required).

I am writing this in C and in the Linux environment -- so is there a way to even do that? Would I need to use the fork() command or write my own custom Socket Pool to handle multiple outgoing connections at once?

- Stack Overflow

**Salem** · 05-03-2008

Welcome back, you've been away far too long.

You could look at the way nmap does this.

**Stack Overflow** · 05-03-2008

Originally Posted by Salem

Welcome back, you've been away far too long.

Thanks. It's good to be back! 8)

Originally Posted by Salem

You could look at the way nmap does this.

That suggestion may be of some help down the line, however I'd like to show you an example of what I'm trying to do in code format:

Code:

void queryServers(int n, char data[][64]) {
	/*** n = amount of servers to query ***/
	/*** data[x] is the string holding IP and Port info ***/
	/*** Example: 123.45.67.89:10110 */

	/* local variables ... */
	int i, port;

	/* loop through all servers needed to be queried */
	for (i = 0; i < n; i++) {
		/* TODO: parse port from data[] and convert to integer */
		port = atoi( [i] );

		/* function to query server */
		queryData(data[i], port);
	}

	/* end */
	return;
}

Keep in mind that I have already written the query code. I know exactly what to query and what data I should receive. Very much like querying a game server -- you query a specific IP & Port and it dumps a string back to you (like map name, etc). I just need to save that dump information to a file.

What I'm struggling with is developing an efficient way to complete a lot of server queries without losing time if one them times out or hangs.

My code above is very incomplete, as you can probably tell, but it's the basic idea. The for loop is very bad because it only processes one query at a time and I want to find a good way to tackle a lot of them even if I have to open multiple sockets and threads to get it done.

So I'd find a way to open 10 different connections to 10 different servers and begin querying. As soon as one is done, I'd close that connection and begin querying the 11th. So on and so forth... If one of them takes fifteen seconds to complete the query and the other 9 finish sooner than that, it then frees up more slots to begin querying more servers while that slow one doesn't affect the rest of the group.

I just don't know where to begin and what the best concept would be. And I don't know if it's even possible, but I think it should be.

**Codeplug** · 05-03-2008

Any reasonable strategy is going to include non-blocking sockets. This would allow connect() calls to return immediately. Then the socket can go into fd_set, allowing select() to poll multiple sockets for connect() completion.

gg

**Stack Overflow** · 05-05-2008

Excellent suggestion, Codeplug.

I've made a lot of progress since your post and have decided to implement non-blocking sockets into my program. I have one question though:

How do you send out data and then catch the response using non-blocking processing?

So far I have a non-blocking server that does what you explained in your post -- it waits to see if there are any incoming connections or data, but doesn't block. However I can't seem to find out how to query another server and have my non-blocking server pick up the response. I think it may be related to the sendto() or recvfrom() command, but I can't get my head around it.

A diagram:

Code:

     Client
	\
	 \
	  \
	Server		Non-Blocking Server
	    \		  /
	     \		 /
	      \		/
	  	Server

So to clarify, the blue and teal colored objects are my listening servers. I use the fork() command to give my Non-Blocking Server full control when it's time to start querying the list of servers sent by the original client. I initialize the NB server only when it's time to start querying and close it down when I'm completely done.

Layout: One Client connects to my basic Server. That Client sends a string of IP Addresses to me. I parse the received data into a readable list and start querying the Server's.

Lastly, which is where I'm stuck, I need the received data from the Server's to send its data to my Non-Blocking Server -- so it doesn't block per server query in case I have 1000 servers to poll.

I have the non-blocking server configured 100%. I just need to know if it's possible to receive that data to a specific socket or port. Also, as a piece of information, my two servers operate on two different ports. For example my basic server listens on port 10110 and my NB server listens on 10111. I need to poll results the send to port 10111 so my NB server can pick it up properly.

I hope that makes sense. Thanks again for any assistance you may be able to provide.

**Perspective** · 05-06-2008

>I just need to know if it's possible to receive that data to a specific socket

You can use poll() to see which of a set of file descriptors has data available to read.

**abachler** · 05-06-2008

Use multiple threads, register the handles and the thread/socket creation time. Periodically test each thread and if it hasnt responded after a set time close the socket associated with that thread, this will cause the blocking call to return and your error handling code should cause the thread to exit.

**Codeplug** · 05-07-2008

>> You can use poll()...
Yes, it's a very similar API to select(). On a side-note, I found an interesting read on using one or the other: http://www.unix.com/ip-networking/37...t-vs-poll.html
Be sure to check out that Chandra paper too: http://www.usenix.org/events/usenix0...apers/chandra/

>> Use multiple threads, register ... thread/socket creation time
We're talking thousands of sockets here - I wouldn't recommend a thread per socket. A thread per-core at most - but that's complicating things a bit.

Once the single-threaded approach has been coded up, and can get through 10 or 1000 queries at a time - then that single-threaded approach can be made to run as multiple threads - in the *hopes* that extra cores will be utilized to achieve speedup. However, any bottleneck in the network stack or hardware interface (like a very busy PCI bus) - then you may not see any speedup at all.

>> A diagram:
I was a little confused by the diagram at first. I wanted to do a little ascii art so I made a diagram of what I gathered in your write-up as I read it:

Code:

 Client
    \
     \ (1)
      \       (3)
     Server -------> NB Server
     / │ \
    /  │  \
   /   │   \ (2)
  /    │    \
S1    S2 ... Sn

(1) Client connects and sends addresses of S1 through Sn to Server
(2) Server initiates connection to S1 through Sn, gathering "query" results
(3) Server sends results to NB

If this is the case, then you could establish the connection to NB first, then send results directly to NB as each of the Sn results come in.

If you really were thinking of having each Sn establish a connection to NB - then I'm thinking you created "NB" for the sole purpose of gathering the query results. If this is the case, then forget step (3), you won't need NB. It doesn't really make sense to have each Sn connect to NB - because then you're just putting a potential 1000+ connections burden on two servers instead of just one.

>> How do you send out data and then catch the response using non-blocking processing?
So you'll basically have an "event loop", where each "event" occurs when select (or poll, or epoll) returns.
pseudo code:

Code:

- Create socket and connect to NB (if you send results immediately)
- Create socket for S1 - Sn
- Call connect() for S1 - Sn, add socket to the "write" fd_set (WFDS)
- Start "event loop" - call select() or poll() etc...
   - As connections complete, "ForAll FD_ISSET(WFDS)"
      - Send query packet
      - Move socket into the "read" fd_set (RFDS)
   - As data becomes available, "ForAll FD_ISSET(RFDS)"
      - Read query response (don't forget, socket is non-blocking)
      - Send results for this Sn to NB now, or accumulate to send later
      - Close connection and remove from RFDS, if need be 
   - Continue loop

gg

**Stack Overflow** · 05-09-2008

Thank you very much for all of your help and suggestions, Codeplug.

I haven't had much time to code this week nor have I thoroughly read your post, but it looks very insightful and I will begin implementation on Monday.

Thanks again and I will keep you informed.

Thread: Query multiple servers quickly

Thread Tools

Search Thread

Display

Query multiple servers quickly

Similar Threads

Multiple inheritance: casting yields different address

Multiple servers on one network (winsock)

Phantom redefinition

Linker errors - Multiple Source files

Using multiple source files: Multiple Declarations & Wrong line numbers