Fiddling with the bits(Big Endian vs Little Endian)

**Click_here** · 10-25-2012

I2C is one of those big-endian serial protocols. From what I understand about it, I2C relies on this property to determine which module on the bus has the highest priority to send a packet when multiple modules are trying to send a packet at the same time.

Not quite - The bus would work exactly the same if the bit-Endianess was chosen to be the other way around - But it's a standard now.

It would be too far off topic to explain how it works, but I'll leave a link to wiki if anyone is interested - I²C - Wikipedia, the free encyclopedia

**brewbuck** · 10-27-2012

Originally Posted by christop

Also, I retract part of my last post where I said that registers had no endianness. Actually they do. Most architectures put the least-significant bit at the bit address 0 and the most-significant bit at the bit address 31 (or 15 or 7 or whatever, depending on the register width). Those are little-endian (at the bit level) systems. Some architectures number their bits the other way, so they're big-endian (at the bit level).

I'm not sure what architecture you are referring to, but I know of no architecture where it is possible to ask for "the 0'th bit" of a register. Yes, there are instructions like BSF and BSR (on x86), but those instructions explicitly indicate the direction of bit order and you get your choice. Now, if you read the manual for the CPU and you find drawing in there that label the least-significant bit as "0" then that just means the person who wrote the manual chose that convention. Unless there is some machine instruction which means "give the n'th bit of X" then you cannot talk at all about "endianness" of bits in a register.

And even if there was, you could just as well say that it is the assembly language which defines the convention. I could always alter the assembly language so that when using such an instruction to access the "n'th bit" it actually encodes an instruction which accesses the "31-n'th" bit, thereby swapping the "endianness." It's all a bunch of nonsense unless you can directly index a bit from RAM.

Not to mention that whether the lsb of a register is the 0 bit or the 31 (or 63) bit, is completely irrelevant to anything you could possibly want to do.

**christop** · 10-27-2012

Originally Posted by brewbuck

I'm not sure what architecture you are referring to, but I know of no architecture where it is possible to ask for "the 0'th bit" of a register.

Motorola 68000 has btst, bset, bclr, bchg instructions that take a bit number, either as an immediate value encoded in the machine code instruction or as a value from a register. Bit 0 is the lsb on 68000.

The Z80 can also address a bit in a register with a few similar bit instructions. I'd be surprised if x86 didn't also include similar bit instructions because Z80 and x86 have a common ancestry. Bit 0 is the lsb on Z80 too.

I don't consider the 68000 and Z80 to be exotic processors. (Actually, they are fairly mainstream since they can be found in TI graphing calculators).

Now, if you read the manual for the CPU and you find drawing in there that label the least-significant bit as "0" then that just means the person who wrote the manual chose that convention. Unless there is some machine instruction which means "give the n'th bit of X" then you cannot talk at all about "endianness" of bits in a register.

And even if there was, you could just as well say that it is the assembly language which defines the convention. I could always alter the assembly language so that when using such an instruction to access the "n'th bit" it actually encodes an instruction which accesses the "31-n'th" bit, thereby swapping the "endianness." It's all a bunch of nonsense unless you can directly index a bit from RAM.

Not to mention that whether the lsb of a register is the 0 bit or the 31 (or 63) bit, is completely irrelevant to anything you could possibly want to do.

As I mentioned with the 68000 and Z80, the bit endianness is baked into the architecture. It's not just a convention made up by the manual writer or assembler writer. And I think the 68000 can directly index a bit in RAM with one of the same bit instructions.

Some architectures (such as the S/360, PowerPC, and PA-RISC) are big-endian at the bit level. In those architectures the msb of the registers (and probably in RAM too) is bit 0. This has some of the same advantages and disadvantages that big-endian ordering has at the byte level.

There is plenty of literature online from experts in the field regarding bit endianness (such as Wikipedia and the in-depth article by Danny Cohen that I linked to already), but if you're still convinced that endianness does not apply to bits, I don't know of anything else I can do or say to convince you otherwise.

**Nominal Animal** · 10-27-2012

Just frigging stop, you're confusing newbies with your rubbish!

By definition, binary digit N -- also known as bit N -- has value 2^N.

The least significant digit in a binary integer is digit zero, which corresponds to the decimal number 1 = 2⁰.

When written on paper or on screen, our mathematical conventions unequivocably state that the least significant digit is written on the rightmost position, with the other digits in increasing order of significance to leftwards. For real numbers, a decimal point (. or , depending on locale and conventions) is written between digits 0 and -1.

There is only one bit order in programming, because programming is a form of math, and above is the mathematical definition for binary numbers.

Where byte order is a storage convention, bit order only applies when information is transferred as individual, separate bits. There is no bit endianness even then, because the separate bits are always numbered. We just find the term bit order convenient, because the bits are always ordered either least significant bit first, or most significant bit first.

In parallel protocols, you have as many wires as you have bits transferred in one unit. Because the bits are sent and received in parallel, there is no order between them.

In simple serial protocols, you have one bit transferred at a time, one after another. They all explicitly state the order in which the bits in an N-bit word are transferred. No matter what your CPU or MCU is, the math to pick out the next bit in a machine word to send/receive, is the same for a given serial protocol. Just read that again, until you grok it. It means that you only worry about the order of bits when you send/receive them individually, and even then you always use the order you have agreed with the recipient. There is no conditionals based on the CPU/MCU; it is always done the same way.

(Note that if you send/receive numeric units larger than what you can fit in a machine register, and you wish to have a specific byte order for the data in memory, then you worry about the byte order. Bit order is irrelevant even then.)

More complex serial protocols send groups of bits in parallel, but otherwise the exact same logic applies as to simple serial protocols.

The microcode logic CPUs and MCUs have so they can work on multi-bit words may have a bit endianness. That logic is never visible to the programmer, unless the implementation of the microcode has an error. The situation is exactly the same as if you had just one-bit bytes: where programmers sometimes concern themselves with byte order, sometimes hardware designers need to worry about bit order, because the stuff they do is similar to programming with one-bit words.

Simply put, if you are a programmer, then there is only one possible bit order, the one that is defined by math. If you transfer individual bits, then you need to agree on which order and how many with the recipient -- but that is always the same regardless on what hardware you do it on; it too is just math, and agreement with the recipient.

Fornicate this. I guess you can show the writing, but not make anyone read or understand.

**christop** · 10-27-2012

You're right, we are confusing newbies. (I think that's what I do best!) But what I write is not rubbish!

C can't address individual bits in a byte (and bitfields don't really count since those have an implementation-defined order), even if many (common) architectures can natively address bits. C doesn't care about the bit endianness of the target, and the C programmer shouldn't either. The compiler will often convert bit mask operations into bit test/set/clear/change instructions using the appropriate bit numbering for the system, whether the system calls the lsb number 0 or 7 or 15 or 31 or 63.

You are correct that mathematical conventions state that number 0 of something (such as a bit or byte) is the least-significant in a larger unit (such as a byte or word), but computer architectures don't always follow those conventions. In big-endian system, byte 0 in a word is actually the most-significant byte. In the words of Cohen: "Remember: the Big-Endians were the outlaws." You have to know how big the word is to know what its significance (or "value" or "weight") is. The same applies to those uncommon bitwise big-endian systems.

**Nominal Animal** · 10-27-2012

Originally Posted by christop

whether the system calls the lsb number 0 or 7 or 15 or 31 or 63.

The least significant bit is always bit 0. You can "call" it something else just as much as you can call the decimal number "5" (or any other number) the smallest positive integer.

Originally Posted by christop

In big-endian system, byte 0 in a word is actually the most-significant byte.

Nope. Given a multi-byte value stored somewhere, the byte at index 0 is the most significant byte. That is the big-endian byte order.

If you see a chart about bits and bytes packed to a single value (say, a register), the byte numbers refer to indices when stored. (Unless, of course, the chart explicitly states some other numbering system.)

On architectures where the processor can address a byte or other unit within a register, the bits affected in the larger unit are always explicitly defined; there is no "bit order" or "byte order" there, just a possibility of manipulating specific bits of a register as a separate unit.

Usually, these "bytes" are labeled using letters, or letter-number conventions, just to avoid any confusion with byte order. On x86, bits 0 to 7 of the four main registers are accessible as al, bl, cl, and dh, and bit 8 to 15 as ah, bh, ch, and dh. Bits 0 to 15 are accessible as ax, bx, cx, and dx; bits 0 to 31 as eax, ebx, ecx, and edx; and bits 0 to 63 as rax, rbx, rcx, and rdx (if the CPU is capable of AMD64). MMX and SSE extensions to x86 do specify units within an MMX or XMM register with numbers, but always with the unit 0 in the least significant bits. It is not a "bit order", just a documentation convention; for example the 8-bit constant used in the shuffle (swizzle) operation uses a complicated numbering scheme that does not exactly follow the documentation convention. In fact, you could rewrite the SSE and MMX specifications using 0 for the most significant unit, and nothing would change from the programmer's perspective!

Originally Posted by christop

The same applies to those uncommon bitwise big-endian systems.

That is just the rubbish I was talking about. It just does not make any sense! There is no "bitwise big-endian", or "bitwise little-endian", or any "bitwise endian".

On all systems the bit order is the same, the one that is set by binary algebra, with bit 0 being the least significant bit in a binary integer.

I give up. I don't think I can help any of you see the facts.

**christop** · 10-29-2012

I think part of our disagreement stems from my loose terminology in my previous posts. In many places where I said "number" (or where I omitted the word "number" as in "byte 0") I should have said "index". Let me try again with stricter use of these words.

From this post onward I will use the following conventions:

byte number n
bit number n

These are the mathematical definitions. Bit or byte number n has the significance b^n, where b is the base (2 for bit, 256 for byte, assuming 8-bit bytes). Byte or bit number 0 is always the least-significant byte or bit in a word or byte.

Then we have the architecture's definitions or conventions:

byte[i]
bit[i]

This is the byte or bit at index i, where index 0 is the first bit or byte in time for communication protocols or the first bit or byte in a memory address space. The numerical value or significance of the bit or byte at index 0 depends on the endianness of the system.

I want to clarify that endianness is defined as the order of units (eg, bits or bytes) within a larger unit (eg, byte or word) in terms of the smaller unit's indices. Byte and bit endianness is defined by the architecture or protocol. I'll use the notation bit:byte to mean the order of bits within a byte, byte:word the order of bytes within a word, and word:long the order of words within a long (bit/byte/word/long have sizes 1/8/16/32 in this discussion, though obviously different systems can define them differently, and many systems have more than these four units). Also, there really exists only two machine endiannesses: big and little. In so-called "mixed endian" or "middle endian" systems, the endianness of different unit pairs are different, such as byte:word and word:long, so that the endianness of byte:long appears to be neither big nor little. In a PDP-11, for example, byte:word is little endian while word:long is big endian. For a modern-day example, IP over Ethernet is also "mixed endian": Ethernet is bit:byte little-endian (lsb is sent first) but IP is byte:word (and for units larger than word) big-endian ("network byte order").

In a byte:word little-endian machine, byte[n] = byte number n. Likewise, in a bit:byte little-endian machine, bit[n] = bit number n for all values of n. Most architectures and protocols are bit:byte little-endian.

In a byte:word big-endian machine, byte[1-n] = byte number n. For bit:byte, bit[7-n] = bit number n. For bit:word, bit[15-n] = bit number n. For bit:long, bit[31-n] = bit number n. In any case, byte[0] or bit[0] is the most-significant byte or bit within a larger unit. Relatively few architectures and protocols are bit:byte big-endian (the PowerPC architecture and the I2C protocol being notable examples).

Now, from a C perspective, there is no bit:byte endianness. A byte (or rather, a char) is the smallest addressable unit in C. You simply cannot access a specific bit index in C without prior knowledge of the bit endianness of the target machine. You could say that the C abstract treats bits in parallel, as a parallel communication protocol does. All operations are done in terms of the mathematical definitions, such as (1<<n) being bit number n, or you can say that 0x80 has bit number 7 set, or you can say that 0x4200 has the MSB of a 2-byte word set to 0x42. In the last example, the byte order makes no difference either unless you want to access the individual bytes in the word directly.

Some other high-level languages can address bits and they define which endianness they use. Algol 68, for example, has an elem operator to address a bit in a word, where bit[1] is the msb. This is an oddball language, to be sure, as it starts its bit indices from 1. Ada, on the other hand, allows the programmer to set the bit endianness when accessing bits in words.

Originally Posted by Nominal Animal

On all systems the bit order is the same, the one that is set by binary algebra, with bit 0 being the least significant bit in a binary integer.

That is true by definition. By the same definition, all systems have the same byte order, with byte number 0 being the least significant byte in a word. If you are instead talking about bit and byte indices, that is clearly false (as byte index 0 can be the MSB and bit index 0 can be the msb). Not all systems and protocols have the same bit and byte order.

The one thing I don't understand is why you think that endianness applies to bytes but not to bits. The two are directly analogous! Bytes clearly have an order within a word. Bits clearly have an order within a byte. Both orders are defined by the architecture or protocol. When the byte order matches the mathematical order, it's little endian; otherwise it's big endian. When the bit order matches the mathematical order, it's little endian; otherwise it's big endian. It's as simple as that.

TL;DR version: To refute the fact that bit endianness exists, you will have to prove that no system exists that calls the most-significant bit in a byte (or word or long etc) "bit index 0". That task will be just as difficult (ie, impossible) as proving that no system exists that calls the most-significant byte in a word "byte index 0". Good luck proving the non-existence of existing systems.

**Click_here** · 10-29-2012

To load accumulator A with the binary value of one for the Motorola 68HC11, the following is done

LDAA %00000001

Are you saying that on another system I would have to enter %10000000 for the value one? Or compare a value with %10000000 to see if it is even?

**christop** · 10-29-2012

Originally Posted by Click_here

To load accumulator A with the binary value of one for the Motorola 68HC11, the following is done

LDAA %00000001

Are you saying that on another system I would have to enter %10000000 for the value one? Or compare a value with %10000000 to see if it is even?

Nope. The value is %00000001 wherever it is used. But to change that 1 bit with one of the bit-changing instructions (such as BSET or BCLR) you have to know the bit endianness of the system. It might be index 0 or 7 or something else depending on the size of the register. Most likely it is index 0 (little endian), but some systems would say it is index 31 in a 32-bit register.

**Click_here** · 10-29-2012

Can you provide a real example of a bit-set instruction where bit-Endianess is demonstrated along with a datasheet?

**A34Chris** · 10-30-2012

Thanks guys for all this help. Its a lot to absorb. I'll have to study on this.

**A34Chris** · 11-17-2012

Hahaha@ flipping my PC over and now its Little Endian. Yeah holy mother of God I'm confused.

**christop** · 11-19-2012

Ack! I nearly forgot about this topic!

Click_Here: If you look at the PowerPC Software Reference Manual you'll see that everything treats the msb as bit 0. The PowerPC ISA does not include any bit-set instructions (it seems that most bitwise big-endian ISA's are RISC which tend not to have bit-set instructions, as the same operations can be performed by "and"/"or"/"xor" instructions). However, it does have some specific-purpose bit-setting instructions, such as mtfsb0, which moves a value to bit 0 (msb) of the FPSCR. There's also the instruction cntlzw which counts leading zeros, starting at bit 0 in a word. The count that is returned by that instruction is also the first non-zero bit number (to get the mathematical bit number you have to subtract that value from 31).

That's what I've been able to find with a quick search so far.

As an aside, I learned that PowerPC has an "Old MacDonald" instruction (eieio--Enforce In-Order Execution of I/O).

**Nominal Animal** · 11-19-2012

Originally Posted by christop

If you look at the PowerPC Software Reference Manual you'll see that everything treats the msb as bit 0.

It just labels the bits that way.

It is a documentation convention that has absolutely nothing to do with hardware -- and is a bad one, too.

In particular, the mathematical definition of the most significant bit depends on the register size. If you look at later POWER architecture docs, you'll see the feeble attempts at clearing the confusion in the Notation sections; for example, the Power ISA v2.06B:

For all registers except the Vector category, bits in registers that are less than 64 bits start with bit number 64-L, where L is the register length; for the Vector category, bits in registers that are less than 128 bits start with bit number 128-L.

Please, let me reiterate.

Bit is short for Binary Digit. In a binary number, the most significant digit (msb) is leftmost, least significant digit (lsb) rightmost.

Because binary numbers follow radix-2 positional notation, the rightmost/least significant digit (bit) corresponds to 2⁰, second to 2¹, third to 2² and so on.

Up to this point everyone agrees, even the PDF you linked.

If you label the digit just left to the decimal point 0, then digit N corresponds to 2^N. Then, the least significant bit is bit 0. (You can even use negative indexes for fractional bits, first fractional bit being bit -1.)

This lsb-is-bit-0 is the only labeling scheme worthwhile for a programmer, because it is based on basic arithmetic, and is universal over all architectures and programming languages, with no exceptions, ambiguities, or dependencies.

Because it is a labeling scheme, it has nothing to do with hardware. It is just basic arithmetic.

There is no hardware that requires or is tied to some other labeling scheme. (There is really no defending any other labeling schemes, because this one is the simplest; the universal one. Why make life more complicated than it already is?)

Even if you consider a computer based on non-binary logic, say ternary logic ("trits" instead of "bits"), this labeling scheme still works, because it is based on basic arithmetic; it's pure math.

Because it is just a labeling scheme, there is no "bit endianness". Endian refers to byte order. Bits are always in the same order, lsb to msb; only the labels humans use for them differ.

**Click_here** · 11-19-2012

Thanks for that christop - But I couldn't find any bit-endiness explained or being used. Note that I did find a lot on byte-endiness though.

Originally Posted by christop

...mtfsb0, which moves a value to bit 0 (msb) of the FPSCR

Here is an example I found from the IBM Help

Originally Posted by IBM

The mtfsb0 instruction sets the Floating-Point Status and Control Register bit specified by BT to 0.
...
The following code sets the Floating-Point Status and Control Register Floating-Point Overflow Exception Bit (bit 3) to 0:

Code:

mtfsb0 3
# Now bit 3 of the Floating-Point Status and Control
# Register is 0.

And also with cntlzw I could not find any bit-endiness about the description or example

Counts the number of leading zeros of the 32-bit value in a source general-purpose register (GPR) and stores the result in a GPR

Code:

# Assume GPR 3 contains 0x0FFF FFFF 0061 9920.
cntlzw 3,3
# GPR 3 now holds 0x0000 0000 0000 0009. Note that the high-order 32 bits
 are ignored when computing the result.

Would you be able to find something for us?