Small HTTP proxy
Okay, I'm creating a small HTTP proxy (not HTTPS...), but I've hit some walls. I've read almost the entire HTTP RFC (Like a million billion pages).
So when I get the client, I chuck it in it's own thread, and I start reading from the socket, but I need to read some headers, "Host:" for example. But how should I go around sending the contents/headers to the server? What's the best way to read the headers? (Until I hit CR LF CR LF), currently I read a buffer of x bytes, resize a larger buffer append the contents of the small buffer and check for CR LF CR LF, Process the header, then directly read from the client and send it to the server, is that the best way?
Sorry if I haven't been clear, basically I'm asking what's the best way to process HTTP headers for a small proxy server (and modify them if I have to, then send them on)?
Thanks in Advance.
Have you written a 'null' proxy which simply forwards everything in both directions?
Then isn't it just a matter of extracting each header and deciding
- pass through unmodified
- pass through modified
Ahh I see, so rather than trying to handle the headers as a whole I should be doing it header-by-header, edit it if nessisary and pass it on?
Hey, i've just quickly whipped up together a http proxy server if you would like to see how i've done it.
Its small and simple but it processes the header line by line so you can do stuff like block Hosts, etc.
Awesome thanks. One thing I notice you do though is read from the socket byte by byte, and as I discovered a while ago this isn't the "best" way to do it: http://cboard.cprogramming.com/networking-device-communication/91687-read-http-headers.html
I was also wondering, does it matter if I use blocking sockets in there own threads?
The one i posted uses blocking sockets in its own thread. It uses the select function to determine if a new connection needs to be accepted or if there is any data to receive from the client or server.
Receiving byte by byte is probably the best way to do is because you know that you will only be receiving the header and not part of the data (from a POST request or something). It shoudnt affect the performance because it only receives byte by byte while its receiving the first header, after that it just forwards all data in chunks. Obviously receiving a whole file byte by byte would be bad, but it only does it for the header.
> I should be doing it header-by-header, edit it if nessisary and pass it on?
Well it seemed the simpler approach to me.
Perhaps later on there may be a need for dependencies (say changing the client also affects the accepted encodings).
Or you wait for the double \r\n which marks the end of the header, then proceed to tokenise the headers into \r\n terminated lines.
As always, it is just a matter of breaking the process down into functions and data structures which perform very clear steps (much easier to test and debug).
The way i've done is is receive line by line (by receiving byte by byte, until we reach \n) than trim the end of the line to remove any \r's or \n's. If the length of the line is 0 than it means its a blank line and any more data we receive now will be payload data.
Pretty easy if you look at it that way ;), thanks a lot Salem & 39ster.