Server Architecture

I am trying to decide how to design the following server architecture. The
server will accept() connection attempts from a web server over either a UNIX
socket or TCP/IP. Once a connection has been established, the web server sends
request specific data over the socket. The server must then parse the first
few dozen bytes of data, in order to determine where to forward the request to.
So, basically, the server only exists to forward the request to the correct
C++ program.

Each C++ program instance can only handle one request at a time. So, either
the server or the C++ program itself must maintain an appropriate number of
instances of each program (where instances are not thread safe). Furthermore,
the instance needs to have all the web server's sent data available and must
also be able to send information back (either directly or indirectly) over the
same initially accept()ed connection. At the end of each request, this
connection is closed, but the program instance should remain persistent.

I'm looking for some advice on how best to design the server and the C++
programs (given that the behaviour of the web server is out of my control).
Here are two options I came up with (any feedback would be appreciated):

Many Threads
The server could be multi-threaded; each thread would continuously call
accept(). Using information global to all threads, the thread could determine
which program to interface with and get a file descriptor for a free instance
of the program. If there are not enough program instances, the thread would
start another one. So, each program instance would be a stand alone executable,
which could be started from the command line for example. It would bind itself
to a socket specified at start up and this connection would always remain open.
Once the thread has found a socket for an instance, it would send the file
descriptor of the initially accept()ed connection over this socket. So then
the instance should be able to read the remainder of data directly from the web
server and also send directly to the web server. The thread would wait
(call select()) till it receives a finished indication over the instance
socket.

Multiplexing
With the above approach, the threads will just do nothing while the program
instance and the web server communicate. What could be done, is that the thread
does not wait for the instance. In this case, there would be far fewer worker
threads, and also a control thread, which would periodically call select on all
instances marked as busy in the thread global storage. All instances which have
since written a finished indication could once again be marked as free.
One problem I think this might have is that instances will often be flagged
busy when they are not (until the control thread starts executing again).

Thank you for reading this.

Be more specific, what exactly are you trying to do? Why would possibly much simpler solution such as LAMP not work for you?

I'm trying to use the fastcgi protocol to handle dynamic http requests. The basic idea is that the web server handles all static requests. Then, dynamic requests -- identified by urls such as /dynamic/... -- are forwarded to the fastcgi server. The advantage is that the fastcgi server can be on a different physical machine (so web server and fastcgi server would communicate over TCP/IP). Also, the web server can load balance to several fastcgi servers on several machines.

The problem is that the fastcgi server needs to be able to spawn the actual handler programs which do the work (and reside on the same machine as the fastcgi server). So, for example, /dynamic/hello_world should be forwarded to a free instance of the hello_world application. The spawned processes should be persistent (i.e. not be started and killed for each request).