Thread: Header parsing problem

  1. #1
    Registered User
    Join Date
    May 2011
    Posts
    66

    Header parsing problem

    Hi.

    I need to parse a header read from a socket, which has the following structure.

    Bit 1. indicating if this is the final fragment of a message
    Bit 2,3,4. is 0
    Bit 5,6,7,8. is an opcode, indicating the type of the message (text/binary/control etc)
    Bit 9. indicating that the message is masked or not
    Bit 10-16. If its value is < 126, denotes the actual length of the message, if = 126, than the following 16 bits will indicate the length of the message, if = 127, the following 64 bits will be the actual length.

    The header is actually the websocket protocol header, and it contains some other informations, but they are irrelevant for my question.

    I write the parser in C, and I don`t know how to access a specific value in the header. I created a structure and mapped the values from the header to it, and it worked, but because the length of the header itself can change in some conditions, this is no longer an option.

    I saw an example where the author extracted the length of the message (starting with the second byte in the string) with bitwise operations, like this:

    int message length = tmp_buff[1] & 127

    It gives the value correctly, but can somebody explain to me, how this actually works and maybe give me an example, how can I extract, let`s say the opcode, or the value from the two bytes, following the length?

    Thank you.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    bitwise operator tutorial - Google Search
    You need to make sure you're up to speed on the whole & | ^ << >> deal when you start messing around with bits.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    It appears that the bit numbering proceeds from left to right in your example.
    Code:
    1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
    f  0  0  0  p  p  p  p  m  l  l  l  l  l  l  l
    I also assume the 16 bits are accessed as two byes from char tmp_buf[0] and tmp_buf[1].

    To extract 'masked' bit use (tmp_buf[1] >> 7) and check if that's 0 or non zero as required
    To extract 'opcode' field use (tmp_buf[0] & 15).
    To extract "final' bit use (tmp_buf[0] & 128) and check if that's 0 or non zero as required

  4. #4
    TEIAM - problem solved
    Join Date
    Apr 2012
    Location
    Melbourne Australia
    Posts
    1,907
    int message length = tmp_buff[1] & 127
    I'll explain it with a 4 bit unsigned variable
    When you have the number, say 7, it is represented as 0111 in binaray -> I'll write binary numbers like this 0bxxxx to avoid confusion.
    Code:
    7 & 4 = 4
    
    is the same as
    0b0111 & 0b0100
    = 0b0100
     
    Or more generally
    0bx1xx & 0b0100
    = 0b0100
    
    0bx0xx & 0b0100
    = 0b0000
    Notice that if bit '2' was set (with lsb = bit 0), it was going to be 4 regardless of what the other bits are. If bit '2' was not set, it would be 0. So by using the bitwise 'and', the code can isolate different parts of a value.

    With the example you gave, 127 is 0b01111111 -> So you can see that the code is isolating the first 7 bits of a variable tmp_buff[1]
    Fact - Beethoven wrote his first symphony in C

  5. #5
    Registered User
    Join Date
    Nov 2012
    Location
    Some rock floating in space...
    Posts
    32

    Lightbulb

    Quote Originally Posted by raczzoli View Post
    Hi.

    I need to parse a header read from a socket, which has the following structure.

    Bit 1. indicating if this is the final fragment of a message
    Bit 2,3,4. is 0
    Bit 5,6,7,8. is an opcode, indicating the type of the message (text/binary/control etc)
    Bit 9. indicating that the message is masked or not
    Bit 10-16. If its value is < 126, denotes the actual length of the message, if = 126, than the following 16 bits will indicate the length of the message, if = 127, the following 64 bits will be the actual length.

    The header is actually the websocket protocol header, and it contains some other informations, but they are irrelevant for my question.

    I write the parser in C, and I don`t know how to access a specific value in the header. I created a structure and mapped the values from the header to it, and it worked, but because the length of the header itself can change in some conditions, this is no longer an option.

    I saw an example where the author extracted the length of the message (starting with the second byte in the string) with bitwise operations, like this:

    int message length = tmp_buff[1] & 127

    It gives the value correctly, but can somebody explain to me, how this actually works and maybe give me an example, how can I extract, let`s say the opcode, or the value from the two bytes, following the length?

    Thank you.
    I recommend making your life a little easier and use "bit fields" in a structure.. Below is a link explaining bit fields in more depth.

    Bit Fields in C

  6. #6
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    948
    Quote Originally Posted by twiki View Post
    I recommend making your life a little easier and use "bit fields" in a structure.. Below is a link explaining bit fields in more depth.

    Bit Fields in C
    Careful. The order of bit fields in a struct is implementation-defined.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. parsing problem
    By jasenmh in forum C++ Programming
    Replies: 2
    Last Post: 11-27-2007, 09:39 AM
  2. Parsing problem
    By vaineo in forum C Programming
    Replies: 3
    Last Post: 07-28-2004, 06:53 AM
  3. Parsing/EOF() problem
    By IonBlade in forum C++ Programming
    Replies: 7
    Last Post: 02-09-2003, 05:59 PM
  4. Parsing problem
    By converge in forum C++ Programming
    Replies: 1
    Last Post: 01-24-2002, 11:27 AM