I am trying to decide how to design the following server architecture. The
server will accept() connection attempts from a web server over either a UNIX
socket or TCP/IP. Once a connection has been established, the web server sends
request specific data over the socket. The server must then parse the first
few dozen bytes of data, in order to determine where to forward the request to.
So, basically, the server only exists to forward the request to the correct
C++ program.

Each C++ program instance can only handle one request at a time. So, either
the server or the C++ program itself must maintain an appropriate number of
instances of each program (where instances are not thread safe). Furthermore,
the instance needs to have all the web server's sent data available and must
also be able to send information back (either directly or indirectly) over the
same initially accept()ed connection. At the end of each request, this
connection is closed, but the program instance should remain persistent.

I'm looking for some advice on how best to design the server and the C++
programs (given that the behaviour of the web server is out of my control).
Here are two options I came up with (any feedback would be appreciated):

Many Threads
The server could be multi-threaded; each thread would continuously call
accept(). Using information global to all threads, the thread could determine
which program to interface with and get a file descriptor for a free instance
of the program. If there are not enough program instances, the thread would
start another one. So, each program instance would be a stand alone executable,
which could be started from the command line for example. It would bind itself
to a socket specified at start up and this connection would always remain open.
Once the thread has found a socket for an instance, it would send the file
descriptor of the initially accept()ed connection over this socket. So then
the instance should be able to read the remainder of data directly from the web
server and also send directly to the web server. The thread would wait
(call select()) till it receives a finished indication over the instance
socket.

Multiplexing
With the above approach, the threads will just do nothing while the program
instance and the web server communicate. What could be done, is that the thread
does not wait for the instance. In this case, there would be far fewer worker
threads, and also a control thread, which would periodically call select on all
instances marked as busy in the thread global storage. All instances which have
since written a finished indication could once again be marked as free.
One problem I think this might have is that instances will often be flagged
busy when they are not (until the control thread starts executing again).

Thank you for reading this.