Thread: Bitfields, Bit/Little Endian

  1. #1
    Registered User
    Join Date
    Oct 2007
    Posts
    22

    Bitfields, Bit/Little Endian

    Hi all

    I'm currently looking at different structs that can represent an IP header. Now I found the following two things:

    Code:
    /*
     * From /usr/include/netinet/in.h
     */
    struct iphdr
      {
    #if __BYTE_ORDER == __LITTLE_ENDIAN
        unsigned int ihl:4;
        unsigned int version:4;
    #elif __BYTE_ORDER == __BIG_ENDIAN
        unsigned int version:4;
        unsigned int ihl:4;
    #else
    # error "Please fix <bits/endian.h>"
    #endif
    ...
    ...
    }
     
    /*
     * From "ip.h" of the tcpdump source
     */
    struct ip {
    u_int8_t        ip_vhl;         /* header length, version */
    #define IP_V(ip)        (((ip)->ip_vhl & 0xf0) >> 4)
    #define IP_HL(ip)       ((ip)->ip_vhl & 0x0f)
    ...
    ...
    }
    As one can see, the first version uses bitfields to access the IP-version and IP header length. It also seems to care about Big/Little Endian.
    The second uses a u_int8_t to access the byte holding both header length and version and separates the two via the '&' and bitshifting.

    The second one makes sense to me, it doesn't care about byte order, why should it? But why does the first version has to care about byte order? Isn't it true that anyway the first four bits are the IP version, the next 4 bits are IP header length?...

    Rafael

  2. #2
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by plan7 View Post
    The second one makes sense to me, it doesn't care about byte order, why should it? But why does the first version has to care about byte order? Isn't it true that anyway the first four bits are the IP version, the next 4 bits are IP header length?...
    Strictly, endianness does not apply to the smallest units of addressable storage, i.e. bytes. In this case however, the term seems to be applied to the order in which the compiler packs bits inside bitfields. Assume a simple structure:

    Code:
    struct bits
    {
        unsigned int x:4;
        unsigned int y:4;
    };
    The compiler will PROBABLY pack all 8 bits into a single byte. But does x get placed in the MORE SIGNIFICANT nybble, or the less significant one? That is the question the header file is trying to answer.

    I would not refer to this as "endianness" -- it's simply a convention of how bits are packed into larger words inside C bitfields.

  3. #3
    Registered User
    Join Date
    Nov 2007
    Posts
    19

    Wink

    The first need to care about in which machine the program is running because of the difference between Big and Little Ending save bytes in memory.

    Big endian saves the most significant byte in lower memory address's, to the opposite of Little endian that saves the less significant byte in lower memoriy address 's.

    Example the following integer variable (assuming int = 4 bytes) : 0X1A1B1C1D

    in big endian would be saved in memory (assuming address's grow from left to right)
    1A | 1B | 1C | 1D
    in little endian would be
    1D | 1C | 1B | 1A


    if im not mistaken off course :P, there's lot of info on the web about big/little endian

    http://en.wikipedia.org/wiki/Endianness

  4. #4
    Registered User
    Join Date
    Nov 2007
    Posts
    19
    Quote Originally Posted by brewbuck View Post
    Strictly, endianness does not apply to the smallest units of addressable storage, i.e. bytes.
    It does apply exactly to byte packing order >.<

  5. #5
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by force of will View Post
    It does apply exactly to byte packing order >.<
    But not within bytes themselves, which is the whole point of this conversation. Endianness is a property of the machine. But what is being described here is merely a property of the compiler.

    Apparently, gcc will pack the first entries of a bitfield into the MSB on "big endian" machines, and into the LSB on "little endian" ones. But it doesn't have to do that. Has nothing to do with machine endianness.

  6. #6
    Registered User
    Join Date
    Nov 2007
    Posts
    19
    Quote Originally Posted by brewbuck View Post
    But not within bytes themselves, which is the whole point of this conversation. Endianness is a property of the machine. But what is being described here is merely a property of the compiler.
    I tough you're refering to bytes packing order. Its true that inside the bytes themself nothing change.


    It as to do with Networking comunication, it seems that most protocols (if not all) work in Big Endian.

    So if you're in a Little Indian computer. the Multi bytes values you receive must be reordered for your machine, and reordered again to send them over the network.


    http://www.ibm.com/developerworks/ai...x.html?ca=drs-

  7. #7
    Registered User
    Join Date
    Oct 2007
    Posts
    22
    Well, that makes sense (gcc packing the bits in a different order).
    But what still bothers me: There is nothing like that defined in the C standard, at least as I know?
    So I might get wrong results when using a different compiler than GCC, because he decides the other way round or possibly only knows one sort of doing that (same on Big/Little endian machines)?
    Said that, the second attempt seems to be much safer...

    Many thanks so far!
    Rafael

  8. #8
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    The standard leaves pretty much everything about bit-fields as being implementation specific. I don't think you can even use the endian of the machine to infer the order of bit fields.

    For all practical purposes, bit-fields are useless for pulling apart external data formats.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  9. #9
    Registered User
    Join Date
    Oct 2007
    Posts
    22
    Ok, that's what I thought. But nevertheless I wonder why this strange 'method' is used within Linux Source, and also FreeBSD as I just found out, and possibly even more ...

    Rafael

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Over-reliance on assuming gcc is the compiler springs to mind.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You can PROBABLY rely on gcc to implement bitfields in a relatively deterministic way - that is, for most processor architectures, the bitfields will work deterministicly as long as you use the same compiler vendor.

    The fact that bitfields are "endian-sensitive" comes from this:
    Code:
    struct bits {
        unsigned a:4;
        unsigned b:5;
        unsigned c:7;
    };
    For this to be at least somewhat sensible on a machine with a 32-bit word, you expect a to be either the highest or lowest bits, and that b is the "next" 5 bits to that word. This in turn means that one bit form b is in the "next byte" from a. Follow me so far? Well, if we change the byte-order, the order of these bitfields would also have to change accordingly.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  12. #12
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    More problems come from this.
    Code:
    struct bits {
      unsigned a:20;
      unsigned b:20;
    };
    On a 32-bit machine, a and b would be in separate words, with a gap of a few unused bits in between them. On a 64-bit machine this invisible padding would disappear.
    If there are any holes in your bit-fields, you need to be specific about where they are. Using the ":0" field width to indicate to the compiler to move to the next storage unit won't produce the desired effect.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  13. #13
    Registered User
    Join Date
    Oct 2007
    Posts
    22
    Quote Originally Posted by matsp View Post
    You can PROBABLY rely on gcc to implement bitfields in a relatively deterministic way - that is, for most processor architectures, the bitfields will work deterministicly as long as you use the same compiler vendor.

    The fact that bitfields are "endian-sensitive" comes from this:
    Code:
    struct bits {
        unsigned a:4;
        unsigned b:5;
        unsigned c:7;
    };
    For this to be at least somewhat sensible on a machine with a 32-bit word, you expect a to be either the highest or lowest bits, and that b is the "next" 5 bits to that word. This in turn means that one bit form b is in the "next byte" from a. Follow me so far? Well, if we change the byte-order, the order of these bitfields would also have to change accordingly.

    --
    Mats
    Yes, I can follow you so far. Makes totally sense! But what about

    Code:
    struct test {
       unsigned int a:4;
       unsigned int b:4;
    }
    Here I just have one byte finally, 4 bits & 4 bits...so no byte intersections, why care about the ordering? Possibly because of how the compiler tends to pack the bits.

    Rafael

  14. #14
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Yes, it matters, because the compiler will still order bytes in the "little" or "big" endian way.

    In a big endian machine, you expect the high bits to come first. Yes, the most significant BYTE is still the most significant BYTE, and bits within it are still in a determined order, but if you look at the bits, they are:
    76543210
    bbbbaaaa
    whilst a little endian
    01234567
    aaaabbbb

    Of course, if you have an unsigned char that you "and with 0x0F" or "and with 0xF0", both architectures will still work the same.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  15. #15
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by plan7 View Post
    Ok, that's what I thought. But nevertheless I wonder why this strange 'method' is used within Linux Source, and also FreeBSD as I just found out, and possibly even more ...
    gcc promises all sorts of things above and beyond the standard. Long ago it was decided that to compile the Linux kernel, you needed gcc. So kernel and system developers are free to take advantage of any gcc property they feel like.

    Obviously there is no need for system-level code to conform to any standard so long as it works correctly.

    Conversely, the gcc people work to try to maintain backward compatibility. But NONE of this is "standard."

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Bitfields, Bit/Little Endian
    By azerej in forum C Programming
    Replies: 0
    Last Post: 05-26-2008, 02:01 AM
  2. Big Endian & Little Endian
    By swaugh in forum C Programming
    Replies: 18
    Last Post: 06-06-2007, 11:25 PM
  3. Big Endian Little Endian Complex- Converting Characters
    By bd02eagle in forum C Programming
    Replies: 3
    Last Post: 07-11-2006, 01:01 AM
  4. Big and little endian
    By Cactus_Hugger in forum C Programming
    Replies: 4
    Last Post: 10-12-2005, 07:07 PM
  5. Replies: 10
    Last Post: 06-26-2005, 11:27 AM