Fiddling with the bits(Big Endian vs Little Endian)

**anduril462** · 10-23-2012

No, it doesn't matter. You're mixing the value of the data with it's internal representation. C generally only cares about the value of the data. In terms of arithmetic/logical operations, it's always treated in big-endian order, that is MSB...LSB. So if you shift right one place, you shift off the LSB. Shifting left one place shifts off the MSB. Endianness is really only an issue when you are doing things like writing binary data to disk (via fwrite), or transmitting it across a network.

**Nominal Animal** · 10-23-2012

All current architectures support data access in 8-bit bytes. Therefore, as each 8-bit byte is accessed as one unit, the bit-endianness within a byte is irrelevant. Only the byte order matters. Even the __BIG_ENDIAN, __LITTLE_ENDIAN, and __PDP_ENDIAN preprocessor macros refer to byte order, not the bit-endianness, of the architecture. (And my microcontrollers all document pins by bits, and shift register direction, explicitly; I don't need to know what bit order they use internally, as long as I know the byte order.)

So, forget bit order, and concentrate on the byte order. Unless the manufacturer/documentation tells you, you're hard pressed to find anything that would even tell you the bit order (instead of the byte order).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

I've written some code to transfer numeric data for IEEE-754 binary32 and binary64 types (float and double, respectively, on most architectures) and signed/unsigned twos complement 8, 16, 32 and 64-bit integers.

On very large simulations producing huge quantities of data, it is usually more efficient to allow each processor to save the data in its native format. When the data is processed, it is converted to standard form as part of normal preprocessing, which would be done in any case, but which is I/O bound (and therefore the endianness correction is essentially free then).

Now, it turns out this is really simple to get right in practice.

16-bit integers may be in expected order (AB), or their 8-bit components may be swapped (BA). 32-bit integers and binary32 can have one of four byte orders: ABCD, DCBA, BADC, or CDAB. The latter two are vanishingly rare (you need PDP or VAX, I think, on one end); in practice, you can ignore them. 64-bit integers and binary64 can also technically have four byte orders, but only ABCDEFGH and HGFEDCBA are encountered outside emulators and non-vintage hardware.

If you save one 16-bit integer, one 32-bit integer, one 64-bit integer, one binary32, and one binary64 prototype in your file (208 bits, or 26 bytes total), you can not only automatically detect the endianness of the data relative to current architecture (and automatically adjust accordingly), but you can also use those 26 bytes as the file signature. Even if normal toolchain cannot process the data, the data is never lost, because you can use the prototypes to support any, even an inconceivable, byte order, if you were to encounter one of them.

Typical integers are 0x1234, 0x01020304, and 0x0807060504030201. For binary32, I like 721409.0/1048576.0, which corresponds to 0x1060118544 (on an architectures with the same endian for both IEEE-754 binary32's, and 32-bit twos complement integers). Note that none of those have any repeating bytes. Any value (that is easy to compute exactly) that has a binary representation that has a different value in each byte works for this.

This is also the reason why I don't recommend using a binary value like "0x0001" for such testing. You might have made an error, and used a different size than you expect.

What I like to do for the prototype values, is to use a helper function to read the prototype value, but in any possible byte order. I simply try all the byte orders to see which one gives me the prototype value I expect. If none, then something is wrong; perhaps the file is a different version or something. Otherwise, the observed byte order tells me exactly how I need to swizzle the bytes to get them to the native byte order on the current architecture. For example:

Code:

static inline float float_get(const void *const data, const int byte_order)
{
    uint32_t  value = *(const uint32_t *)data;

    /* Swap bytes */
    if (byte_order & 1)
        value = ((value & 0x00FF00FFU) << 8U)
              | ((value >> 8U) & 0x00FF00FFU);

    /* Swap byte pairs */
    if (byte_order & 2)
        value = ((value & 0x0000FFFFU) << 16)
              | ((value >> 16) & 0x0000FFFFU);

    return *(float *)&value;
}

static inline double double_get(const void *const data, const int byte_order)
{
    uint64_t  value = *(const uint64_t *)data;

    /* Swap bytes */
    if (byte_order & 1)
        value = ((value & 0x00FF00FF00FF00FFUL) << 8U)
              | ((value >> 8U) & 0x00FF00FF00FF00FFUL);

    /* Swap byte pairs */
    if (byte_order & 2)
        value = ((value & 0x0000FFFF0000FFFFUL) << 16)
              | ((value >> 16) & 0x0000FFFF0000FFFFUL);

    /* Swap byte quads */
    if (byte_order & 4)
        value = ((value & 0x00000000FFFFFFFFUL) << 32)
              | ((value >> 16) & 0x00000000FFFFFFFFUL);

    return *(double *)&value;
}

The key point of this approach is that you really do not even know which byte order this architecture uses, even less what the originating architecture used. You only find the byte order swizzle that gives you the prototype values you expect, and therefore is the required operation to convert all such data to the current architecture.

**A34Chris** · 10-23-2012

Originally Posted by anduril462

No, it doesn't matter. You're mixing the value of the data with it's internal representation. C generally only cares about the value of the data. In terms of arithmetic/logical operations, it's always treated in big-endian order, that is MSB...LSB. So if you shift right one place, you shift off the LSB. Shifting left one place shifts off the MSB. Endianness is really only an issue when you are doing things like writing binary data to disk (via fwrite), or transmitting it across a network.

Thank you for clearing that up.

**Nominal Animal** · 10-24-2012

You guys are confusing bit order and byte order.

**itCbitC** · 10-24-2012

Originally Posted by gratiafide

Cool - my comp is little endian. But someone want to give 1 sentance as to what the main advantage or disadvantages are of big/little endian? Years ago, I had to figure this out for a program I was writing and now can't remember the details of it.......

Big endian emulates the natural way of writing numbers, at least the way we have been taught where the MSD is the leftmost one.

**anduril462** · 10-24-2012

Originally Posted by Nominal Animal

You guys are confusing bit order and byte order.

No, we're not. If you have a value of 0x1235, which is stored little endian, it's stored as 0x35 0x12. A34Chris was asking if right shifting worked on the value or internal representation. I.e. if it were right shifted, would you get

0x35 0x12 >> 1 = 0x1A 0x89 which, when reassembled into it's value gives 0x891A

I said no, it would not, it would only operate on the value, so you would always get the correct answer of 0x1235 >> 1 = 0x091A, regardless of internal representation (i.e. endianness or byte order).

Bit order only came in to illustrate that left and right shift operations get rid of the MSB and LSB respectively, regardless of internal representation.

**Nominal Animal** · 10-24-2012

Originally Posted by anduril462

No, we're not. If you have a value of 0x1235, which is stored little endian, it's stored as 0x35 0x12.

That's byte order, right there.

Originally Posted by anduril462

it would only operate on the value, so you would always get the correct answer of 0x1235 >> 1 = 0x091A, regardless of internal representation (i.e. endianness or byte order).

That is right, but not direct enough in my opinion.

All normal binary and arithmetic operations operate on the values, not their storage representation. The rightmost bit of a value is always the zero bit, the least significant bit, no matter what the endianness of the architecture.

endianness = byte order.

When read or written to memory or storage, the interpretation of a value does depend on the byte order. Current architectures only use big-endian byte order (most significant byte first; used by networking hardware) and little-endian byte order (least significant byte first; used by x86, AMD64, and compatible processors). There are old architectures that use a mix between the two for 32-bit values (PDP, for example).

Originally Posted by anduril462

Bit order only came in to illustrate that left and right shift operations get rid of the MSB and LSB respectively, regardless of internal representation.

Let me repeat: arithmetic and binary operators work on values. Values have only one internal binary representation, because they are treated as an unit; all bits in parallel. Bit order has no relevance to programmers.

Byte order comes into the picture when you read or write a multi-byte value to memory or storage.

I think I mentioned in an earlier post that it is not possible to find out the bit order of the architecture your program is running on, because it has no impact at all. There simply are no cases where you'd find a difference either way (although timing certain assembly instructions might provide strong indicators). Where outputs are concerned, the pinouts match to specific bits in machine bytes/words; serial communications interfaces describe which way (most significant or least significant bit first of each value) the bits are serialized to/from the wire.

I'm absolutely serious here. I program on server machines, desktop machines, and even microcontrollers. Bit order is irrelevant. Byte order is often relevant, but only in the sense that byte order affects the way you interpret multi-byte values. That is all.

**anduril462** · 10-24-2012

@Nominal Animal:
You and I are saying basically the same thing, albeit in different ways. I never said anything about bit-order being an issue. I said arithmetic and logical operations behave as though everything is big-endian, i.e. they behave which is the "normal" way most of us humans learn to do such operations,regardless of internal reperesentation. I said that C operates on the value, not on it's internal representation. Whether you port your program to a big- or little-endian machine, you get the same results from shift operations.

I'm absolutely serious here, too. I don't "program on server machines", but I do program desktops and micros. I've written more than enough networking code to know all about byte-order issues, and that bit order is irrelevant.

I don't know why you think I was talking about bit order, but I wasn't. You seem to be misunderstanding what I wrote. This is fairly moot, because A34Chris, the guy who actually asked the question, seems to have understood my answer just fine. After this post, I'm done. I know what I said, and so does A34Chris. You, on the other hand, don't. I don't know if you're having difficulty groking it, or you're just trying to pick a fight or what, but I'm over it.

**Nominal Animal** · 10-24-2012

Originally Posted by anduril462

I said arithmetic and logical operations behave as though everything is big-endian

But they most definitely do not! That exactly is what I referred to, when I said you're confusing bit order and byte order.

Endianness refers to byte order. Byte order has no relevance to arithmetic and logical operators. Bit order is irrelevant.

Arithmetic and logical operations are done in one unit, in parallel. There is no bit order or byte order there, except for the mathematical one, where the zero bit is least significant, well, because it is least significant: the arithmetic value of bit i is by definition 2ⁱ.

Byte order is just a storage convention, for units larger than one byte in size.

Using MSB and LSB just confuses the issue, because you can read them as "most significant byte" and "least significant byte", or "most significant bit" and "least significant bit".

Originally Posted by anduril462

I'm absolutely serious here, too. I don't "program on server machines", but I do program desktops and micros. I've written more than enough networking code to know all about byte-order issues, and that bit order is irrelevant.

I never intended to cast doubt on your know-how. My only problem is with how things have been stated in this thread: it is confusing to the reader.

I only brought my experience up because there really is no situation where bit order would matter. And perhaps I misread your posts (if I did, I apologise), but I didn't see you saying bit order is irrelevant before, quite the opposite.

Originally Posted by anduril462

This is fairly moot, because A34Chris, the guy who actually asked the question, seems to have understood my answer just fine. After this post, I'm done. I know what I said, and so does A34Chris. You, on the other hand, don't. I don't know if you're having difficulty groking it, or you're just trying to pick a fight or what, but I'm over it.

Here's what happened: I re-read the thread, and saw that on a straight reading, it gives the wrong impressions/information. I poked everybody with the one-liner, hoping to get someone to post a message clearing the concepts. You didn't, so I did.

Please, just re-read the thread, then consider whether a reader stumbling on this gets the correct picture or not, then tell me exactly where I got it wrong. Or have a moderator do so: if I am wrong here, I'll gladly accept a public scolding and offer my sincere apologies. My only intention here is to make sure that the information is correct, and gives any reader the correct mental picture.

**itCbitC** · 10-24-2012

Originally Posted by anduril462

@Nominal Animal:
You and I are saying basically the same thing, albeit in different ways. I never said anything about bit-order being an issue. I said arithmetic and logical operations behave as though everything is big-endian, i.e. they behave which is the "normal" way most of us humans learn to do such operations,regardless of internal reperesentation.

Just to clarify, all operations (logical or arithmetic) give correct results because they take place inside the micro's registers; and all registers that make up a cpu are independent of endianness, because they all follow the same bit numbering convention.

Endianness is to memory, what bit numbering is to registers.

That is to say that, irrespective of little or big endian memory storage format, all bits are numbered from right to left for registers with bit 0 the LSB and bit 31 the MSB (for 32 bit registers).

**christop** · 10-24-2012

Originally Posted by itCbitC

Just to clarify, all operations (logical or arithmetic) give correct results because they take place inside the micro's registers; and all registers that make up a cpu are independent of endianness, because they all follow the same bit numbering convention.

Endianness is to memory, what bit numbering is to registers.

That is to say that, irrespective of little or big endian memory storage format, all bits are numbered from right to left for registers with bit 0 the LSB and bit 31 the MSB (for 32 bit registers).

Endianness can apply to bytes as well as bits. It specifies which byte or bit comes first in time (eg, in serial communication) or space (eg, in memory or file). In a register there is no "first" or "last" bit, nor is there a "first" or "last" byte. Bit numbers are simply used as an address within a byte or word. Bit address 0 always comes "before" address 31, just like byte address 0x0000 always comes "before" byte address 0xFFFF, but you couldn't say whether one is big- or little-endian just from that information.

Oh, and the most-/least-significant bit should be abbreviated as msb/lsb, and most-/least-significant byte as MSB/LSB, to avoid confusion.

**brewbuck** · 10-24-2012

Originally Posted by itCbitC

Just to clarify, all operations (logical or arithmetic) give correct results because they take place inside the micro's registers; and all registers that make up a cpu are independent of endianness, because they all follow the same bit numbering convention.

I don't think this statement means anything.

If you AND a value with a mask to access a certain bit, this has nothing to do with the ordering of bits in a register. To access, for instance, the LSB, you AND with 1. This gives you the bit in that position, but tells you nothing of where that bit is located in some (non-existent) address space. The 1 bit is located where it is located, which isn't saying much of anything.

It gets even loonier if you talk about "right" and "left" because now I can change the endianness of my computer by turning it upside down.

Unless individual bits can be addressed by indices, there is no bit endianness. It is a meaningless idea.

As was pointed out, when bits do have a measurable ordering (as in serial communication) then you can speak of the bit-order of the data.

**itCbitC** · 10-25-2012

Originally Posted by christop

Endianness can apply to bytes as well as bits.

It doesn't apply to bits since memory is byte organized i.e. the smallest group of bits that can be addressed is 8.

Originally Posted by christop

It specifies which byte or bit comes first in time (eg, in serial communication) or space (eg, in memory or file).

The bit in serial communications that comes first is part of the byte that is located at the lowest or starting address.
Don't term that bit endianness as there is only one wire to transport the byte and so it has to be done one bit at a time

Originally Posted by christop

In a register there is no "first" or "last" bit, nor is there a "first" or "last" byte. Bit numbers are simply used as an address within a byte or word. Bit address 0 always comes "before" address 31, just like byte address 0x0000 always comes "before" byte address 0xFFFF, but you couldn't say whether one is big- or little-endian just from that information.

Seems like you never read my post in its entirety; I said that endianness does not apply to the micro's registers. Registers across the board, are numbered from right to left starting at zero and the lsb of a number stored in memory maps exactly to the lsb of the same number stored in a micro's register. That is the only commonality, and it has nothing to do with endianness.

Originally Posted by christop

Oh, and the most-/least-significant bit should be abbreviated as msb/lsb, and most-/least-significant byte as MSB/LSB, to avoid confusion.

Gotcha!

**Click_here** · 10-25-2012

Here is a little bit of code based on a section in "C Unleashed" to observe your "Endianess"

Code:

#include <stdio.h>
#include <stdlib.h>


int main(void)
{
    unsigned int value = 0x4321;
    unsigned char *ptr = (unsigned char *)&value;
    int i;


    for (i=0; i< sizeof(value); i++)
    {
        printf(" %02x", ptr[i]);
    }


    putchar('\n');


    return EXIT_SUCCESS;
}

I've never heard "Endianess" refer to the order the bits sit in memory, but sure enough, there is a section for it on Wiki - Endianness - Wikipedia, the free encyclopedia

Note that I'm currently using I2C in a project, and I don't need to know what Endianess the bits are: The microprocessor has hardware to handle all that. If I needed to bit-bang the output I would need to know - Bit banging - Wikipedia, the free encyclopedia

**christop** · 10-25-2012

Endianness does indeed apply to bits. Endianness is not an intrinsic property of just bytes. It's the order of components (such as bits or bytes) within a larger component (such as a byte or word).

Originally Posted by itCbitC

It doesn't apply to bits since memory is byte organized i.e. the smallest group of bits that can be addressed is 8.

Most systems can address bits by address (say, 0 to 31) too, as I mentioned. Reading and writing to memory is done in larger chunks (usually 8 or 16 or 32 bits) for efficiency, and memory is usually addressed by byte. But bits are simply addressed at a finer granularity than bytes. This is analogous to byte addresses within pages in some memory management systems, where the pages themselves have addresses. Blocks on a hard drive are also numbered, and that can be called a block address (each block being something like 4kB).

The bit in serial communications that comes first is part of the byte that is located at the lowest or starting address.
Don't term that bit endianness as there is only one wire to transport the byte and so it has to be done one bit at a time

Of course serial transmission must be done one bit at a time (the bits are sent serially)! But which bit should be sent first? Most serial protocols send the least-significant bit first (little endian), but some serial protocols send the most-significant bit first (big endian). Each serial protocol specifies whether it is big or little endian.

Also, I retract part of my last post where I said that registers had no endianness. Actually they do. Most architectures put the least-significant bit at the bit address 0 and the most-significant bit at the bit address 31 (or 15 or 7 or whatever, depending on the register width). Those are little-endian (at the bit level) systems. Some architectures number their bits the other way, so they're big-endian (at the bit level).

I think everyone should read "ON HOLY WARS AND A PLEA FOR PEACE" at least once in their lives. It clearly explains the issues about endianness at many different layers (bits/bytes/words, serial communication, etc) in a system.

edit:

Click_here: I2C is one of those big-endian serial protocols. From what I understand about it, I2C relies on this property to determine which module on the bus has the highest priority to send a packet when multiple modules are trying to send a packet at the same time.