All current architectures support data access in 8-bit bytes. Therefore, as each 8-bit byte is accessed as one unit, the bit-endianness within a byte is irrelevant. Only the byte order matters. Even the __BIG_ENDIAN, __LITTLE_ENDIAN, and __PDP_ENDIAN preprocessor macros refer to byte order, not the bit-endianness, of the architecture. (And my microcontrollers all document pins by bits, and shift register direction, explicitly; I don't need to know what bit order they use internally, as long as I know the byte order.)
So, forget bit order, and concentrate on the byte order. Unless the manufacturer/documentation tells you, you're hard pressed to find anything that would even tell you the bit order (instead of the byte order).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
I've written some code to transfer numeric data for IEEE-754 binary32 and binary64 types (float and double, respectively, on most architectures) and signed/unsigned twos complement 8, 16, 32 and 64-bit integers.
On very large simulations producing huge quantities of data, it is usually more efficient to allow each processor to save the data in its native format. When the data is processed, it is converted to standard form as part of normal preprocessing, which would be done in any case, but which is I/O bound (and therefore the endianness correction is essentially free then).
Now, it turns out this is really simple to get right in practice.
16-bit integers may be in expected order (AB), or their 8-bit components may be swapped (BA). 32-bit integers and binary32 can have one of four byte orders: ABCD, DCBA, BADC, or CDAB. The latter two are vanishingly rare (you need PDP or VAX, I think, on one end); in practice, you can ignore them. 64-bit integers and binary64 can also technically have four byte orders, but only ABCDEFGH and HGFEDCBA are encountered outside emulators and non-vintage hardware.
If you save one 16-bit integer, one 32-bit integer, one 64-bit integer, one binary32, and one binary64 prototype in your file (208 bits, or 26 bytes total), you can not only automatically detect the endianness of the data relative to current architecture (and automatically adjust accordingly), but you can also use those 26 bytes as the file signature. Even if normal toolchain cannot process the data, the data is never lost, because you can use the prototypes to support any, even an inconceivable, byte order, if you were to encounter one of them.
Typical integers are 0x1234, 0x01020304, and 0x0807060504030201. For binary32, I like 721409.0/1048576.0, which corresponds to 0x1060118544 (on an architectures with the same endian for both IEEE-754 binary32's, and 32-bit twos complement integers). Note that none of those have any repeating bytes. Any value (that is easy to compute exactly) that has a binary representation that has a different value in each byte works for this.
This is also the reason why I don't recommend using a binary value like "0x0001" for such testing. You might have made an error, and used a different size than you expect.
What I like to do for the prototype values, is to use a helper function to read the prototype value, but in any possible byte order. I simply try all the byte orders to see which one gives me the prototype value I expect. If none, then something is wrong; perhaps the file is a different version or something. Otherwise, the observed byte order tells me exactly how I need to swizzle the bytes to get them to the native byte order on the current architecture. For example:
Code:
static inline float float_get(const void *const data, const int byte_order)
{
uint32_t value = *(const uint32_t *)data;
/* Swap bytes */
if (byte_order & 1)
value = ((value & 0x00FF00FFU) << 8U)
| ((value >> 8U) & 0x00FF00FFU);
/* Swap byte pairs */
if (byte_order & 2)
value = ((value & 0x0000FFFFU) << 16)
| ((value >> 16) & 0x0000FFFFU);
return *(float *)&value;
}
static inline double double_get(const void *const data, const int byte_order)
{
uint64_t value = *(const uint64_t *)data;
/* Swap bytes */
if (byte_order & 1)
value = ((value & 0x00FF00FF00FF00FFUL) << 8U)
| ((value >> 8U) & 0x00FF00FF00FF00FFUL);
/* Swap byte pairs */
if (byte_order & 2)
value = ((value & 0x0000FFFF0000FFFFUL) << 16)
| ((value >> 16) & 0x0000FFFF0000FFFFUL);
/* Swap byte quads */
if (byte_order & 4)
value = ((value & 0x00000000FFFFFFFFUL) << 32)
| ((value >> 16) & 0x00000000FFFFFFFFUL);
return *(double *)&value;
}
The key point of this approach is that you really do not even know which byte order this architecture uses, even less what the originating architecture used. You only find the byte order swizzle that gives you the prototype values you expect, and therefore is the required operation to convert all such data to the current architecture.