Thread: Read HTTP headers

  1. #1
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459

    Read HTTP headers

    Hello,

    I'm writing a small app which uses some of the HTTP protocol. I've read the HTTP 1.1 RFC, looked at examples, and it doesn't mention anything about a max header size (eg how large the HTTP headers can be in bytes) - and a lot of the examples allocate different amounts of size for the headers (from 512b to 2k), So I was wondering, what is the best way to read say HTTP headers from a client? Read until you encounter CRLFCRLF? If so how should you read from the socket to efficiently achieve that?

    Reading char-by-char from a socket seems rather "inefficient".

    Thanks
    Last edited by zacs7; 07-11-2007 at 10:56 PM.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > Reading char-by-char from a socket seems rather "inefficient".
    Perhaps, but then again trying something else might be premature optimisation disease.

    I suppose you could create a wrapper class which does block recv() calls and splits the data into consecutive CRLF lines, and maintains any residual data at the end (which would be the first part of the content) in an internal (to the class) buffer.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    Technically there is no limit to the header size, although in practice 2k is probably plenty. What specifically are you trying to do? If you just want to read a file from a server, look into InternetReadFile() in the win32 api. It handles all the header info and just returns the decoded file data.

  4. #4
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by zacs7 View Post
    Reading char-by-char from a socket seems rather "inefficient".
    It is. But consider that the header is not TOO big. It might be inefficient, but you're only reading the header part of the response character-by-character. Once you see the header terminator CR-LF-CR-LF, you know that the header is complete, and can switch to reading large blocks for the remainder of the response.

    A 3 kilobyte header would imply 3000 calls to recv(), but in the grand scheme, that's not so bad. If you had to read an entire 20 megabyte transmission character-by-character we'd have a different story.

  5. #5
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Also remember that the network driver already does buffering, so it's not like it's receiving the data byte by byte.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  6. #6
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by CornedBee View Post
    Also remember that the network driver already does buffering, so it's not like it's receiving the data byte by byte.
    True, but on many systems a call to recv() still implies a context switch in and out of kernel mode. As the data comes in, the network stack places it in a buffer, but if the application only accesses it one byte at a time, it still has to ratchet around quite a bit.

    I'd say ignore the issue for now -- if you start seeing problems, you'll have to come up with some sort of thin buffering layer.

  7. #7
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    I see, thanks everyone - very helpful. The fact is I'm only really interested in the header, one part even "Host: "

    I've decided to read it into a 2K buffer on the stack, find CR-LF-CR-LF and lop the end off. Is that wise?

    Or I was thinking of something a little more complex,

    Code:
    int r = 0;
    char buf[128];
    char * header = NULL;
    
    while((r = recv(sock, buf, sizeof(buf), 0)) == sizeof(buf))
    {
        /* add buf to header (realloc and strcat) */
        /* search through 'header' (from the last buf addon) if we find \r\n\r\n stop */
    }
    Good or crappy way?
    Last edited by zacs7; 07-13-2007 at 12:24 AM.

  8. #8
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Seems more trouble than necessary. Read in a buffer of any size, then parse it for headers. If you reach the end of the buffer but not the end of the headers, copy the unprocessed parts to the beginning of the buffer and fill the rest of it with new data. Continue until you reach the end of the headers.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  9. #9
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Ahhh, thanks

  10. #10
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    The only problem with this approach is when a single header is larger than the buffer. If you're not interested, you can just discard it, otherwise you must collect (and thus introduce state into your loop).
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  11. #11
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    I don't really see how that is more trouble than my suggestion, is it faster or less resource intensive?

  12. #12
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Less resource intensive. You need less memory because you're not collecting all headers at once.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  13. #13
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Hmm, thanks for that.

    Why didn't they implement some sort of payload into the HTTP protocol? They do for content-length yet not for headers? Bah
    Last edited by zacs7; 07-13-2007 at 07:48 AM.

  14. #14
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by zacs7 View Post
    Hmm, thanks for that.

    Why didn't they implement some sort of payload into the HTTP protocol? They do for content-length yet not for headers? Bah
    It would be extremely complicated to account for all the different header fields when computing the header size, since not all of them are necessarily generated by the web server. A CGI or other web application might insert its own headers into the response without the server's knowledge. So in general, the server cannot know how big the header is going to be, although it can usually tell how big the content is, if it's serving up a simple file.

  15. #15
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    And if it can't, there's the chunked transfer encoding.

    Headers are even more complicated. Proxies can insert, change and remove headers. Not that that matters that much - if it's already parsing, it might as well insert the new length.

    But I think, in essence, it wasn't considered necessary. It's not that complicated. If you use Boost.Asio, for example, there's the read_until call that implements my pattern pretty much exactly.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 03-10-2008, 11:57 AM
  2. "sorting news" assignment
    By prljavibluzer in forum C Programming
    Replies: 7
    Last Post: 02-06-2008, 06:45 AM
  3. I am lost on how to read from file and output to file?
    By vicvic2477 in forum C++ Programming
    Replies: 4
    Last Post: 02-27-2005, 11:52 AM
  4. Read Array pro!!Plz help!!
    By Supra in forum C Programming
    Replies: 2
    Last Post: 03-04-2002, 03:49 PM
  5. Help! Can't read decimal number
    By Unregistered in forum C Programming
    Replies: 2
    Last Post: 09-07-2001, 02:09 AM