![]() |
| | #1 |
| Woof, woof! Join Date: Mar 2007 Location: Australia
Posts: 3,295
| Read HTTP headers I'm writing a small app which uses some of the HTTP protocol. I've read the HTTP 1.1 RFC, looked at examples, and it doesn't mention anything about a max header size (eg how large the HTTP headers can be in bytes) - and a lot of the examples allocate different amounts of size for the headers (from 512b to 2k), So I was wondering, what is the best way to read say HTTP headers from a client? Read until you encounter CRLFCRLF? If so how should you read from the socket to efficiently achieve that? Reading char-by-char from a socket seems rather "inefficient". Thanks Last edited by zacs7; 07-11-2007 at 10:56 PM. |
| zacs7 is offline | |
| | #2 |
| and the hat of Jobseeking Join Date: Aug 2001 Location: The edge of the known universe
Posts: 21,699
| > Reading char-by-char from a socket seems rather "inefficient". Perhaps, but then again trying something else might be premature optimisation disease. I suppose you could create a wrapper class which does block recv() calls and splits the data into consecutive CRLF lines, and maintains any residual data at the end (which would be the first part of the content) in an internal (to the class) buffer. |
| Salem is offline | |
| | #3 |
| Malum in se Join Date: Apr 2007
Posts: 3,188
| Technically there is no limit to the header size, although in practice 2k is probably plenty. What specifically are you trying to do? If you just want to read a file from a server, look into InternetReadFile() in the win32 api. It handles all the header info and just returns the decoded file data.
__________________ Until you can build a working general purpose reprogrammable computer out of basic components from radio shack, you are not fit to call yourself a programmer in my presence. This is cwhizard, signing off. |
| abachler is offline | |
| | #4 |
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,768
| It is. But consider that the header is not TOO big. It might be inefficient, but you're only reading the header part of the response character-by-character. Once you see the header terminator CR-LF-CR-LF, you know that the header is complete, and can switch to reading large blocks for the remainder of the response. A 3 kilobyte header would imply 3000 calls to recv(), but in the grand scheme, that's not so bad. If you had to read an entire 20 megabyte transmission character-by-character we'd have a different story. |
| brewbuck is offline | |
| | #5 |
| Cat without Hat Join Date: Apr 2003
Posts: 8,492
| Also remember that the network driver already does buffering, so it's not like it's receiving the data byte by byte.
__________________ All the buzzt! CornedBee"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code." - Flon's Law |
| CornedBee is offline | |
| | #6 | |
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,768
| Quote:
I'd say ignore the issue for now -- if you start seeing problems, you'll have to come up with some sort of thin buffering layer. | |
| brewbuck is offline | |
| | #7 |
| Woof, woof! Join Date: Mar 2007 Location: Australia
Posts: 3,295
| I see, thanks everyone - very helpful. The fact is I'm only really interested in the header, one part even "Host: " ![]() I've decided to read it into a 2K buffer on the stack, find CR-LF-CR-LF and lop the end off. Is that wise? Or I was thinking of something a little more complex, Code: int r = 0;
char buf[128];
char * header = NULL;
while((r = recv(sock, buf, sizeof(buf), 0)) == sizeof(buf))
{
/* add buf to header (realloc and strcat) */
/* search through 'header' (from the last buf addon) if we find \r\n\r\n stop */
}
Last edited by zacs7; 07-13-2007 at 12:24 AM. |
| zacs7 is offline | |
| | #8 |
| Cat without Hat Join Date: Apr 2003
Posts: 8,492
| Seems more trouble than necessary. Read in a buffer of any size, then parse it for headers. If you reach the end of the buffer but not the end of the headers, copy the unprocessed parts to the beginning of the buffer and fill the rest of it with new data. Continue until you reach the end of the headers.
__________________ All the buzzt! CornedBee"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code." - Flon's Law |
| CornedBee is offline | |
| | #10 |
| Cat without Hat Join Date: Apr 2003
Posts: 8,492
| The only problem with this approach is when a single header is larger than the buffer. If you're not interested, you can just discard it, otherwise you must collect (and thus introduce state into your loop).
__________________ All the buzzt! CornedBee"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code." - Flon's Law |
| CornedBee is offline | |
| | #12 |
| Cat without Hat Join Date: Apr 2003
Posts: 8,492
| Less resource intensive. You need less memory because you're not collecting all headers at once.
__________________ All the buzzt! CornedBee"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code." - Flon's Law |
| CornedBee is offline | |
| | #13 |
| Woof, woof! Join Date: Mar 2007 Location: Australia
Posts: 3,295
| Hmm, thanks for that. Why didn't they implement some sort of payload into the HTTP protocol? They do for content-length yet not for headers? Bah Last edited by zacs7; 07-13-2007 at 07:48 AM. |
| zacs7 is offline | |
| | #14 |
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,768
| It would be extremely complicated to account for all the different header fields when computing the header size, since not all of them are necessarily generated by the web server. A CGI or other web application might insert its own headers into the response without the server's knowledge. So in general, the server cannot know how big the header is going to be, although it can usually tell how big the content is, if it's serving up a simple file. |
| brewbuck is offline | |
| | #15 |
| Cat without Hat Join Date: Apr 2003
Posts: 8,492
| And if it can't, there's the chunked transfer encoding. Headers are even more complicated. Proxies can insert, change and remove headers. Not that that matters that much - if it's already parsing, it might as well insert the new length. But I think, in essence, it wasn't considered necessary. It's not that complicated. If you use Boost.Asio, for example, there's the read_until call that implements my pattern pretty much exactly.
__________________ All the buzzt! CornedBee"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code." - Flon's Law |
| CornedBee is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Function argument assignment between types "unsigned int*" and "unsigned long*" | nadeer78 | C Programming | 8 | 03-10-2008 11:57 AM |
| "sorting news" assignment | prljavibluzer | C Programming | 7 | 02-06-2008 06:45 AM |
| I am lost on how to read from file and output to file? | vicvic2477 | C++ Programming | 4 | 02-27-2005 11:52 AM |
| Read Array pro!!Plz help!! | Supra | C Programming | 2 | 03-04-2002 03:49 PM |
| Help! Can't read decimal number | Unregistered | C Programming | 2 | 09-07-2001 02:09 AM |