The way I've solved similar issues, is to use the specific-size integer types (int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t and uint64_t, and Binary32 (float) and Binary64 (double) for floating-point data, with each sender using their native format, and each receiver converting read data to native on the fly. To determine the byte order conversion necessary, the endpoints use a "handshake" which contains a prototype value of each type (with different byte values of each component).
For example, assume you transfer float values. A prototype value I've used is 721409.0 / 1048576.0 = 0.68798923492431640625 , which has the bit pattern 0011111100110000001000000001000016, which as an integer is 3F30201016 in hexadecimal. Thus, each byte in the received prototype value must be either 3F16=63, 3016=48, 2016=32, or 1016=16. Only their order can vary. All the float values will have that same byte order, so figuring out what order change you need to do to get the prototype value correct, also tells you the order change you need to do on any float to interpret it correctly.
The same approach works for integers, too, of course.
The receiver only needs to find the byte order permutation that yields the same byte order as the native prototype value. Currently, you can expect there to be only two: either no permutation (same byte order), or reverse byte order. In the past, there have been different architectures, and you could support all four (1234, 4321, 2143, 3412) for 32-bit and 64-bit types, but I haven't seen anything except same (1234) or reverse (4321) byte order in practice.
On current architectures integers and floats have the same byte order, but that may not necessarily be true in the future.
There are many ways to manipulate the byte order, but I prefer the arithmetic one. For example:
Code:
uint32_t get_u32(const void *const dataptr, const unsigned char byte_order)
{
uint32_t data = *(const uint32_t *)dataptr;
if (byte_order & 1)
data = ((data & 0x00FF00FFU) << 8U)
| ((data >> 8U) & 0x00FF00FFU);
if (byte_order & 2)
data = ((data & 0x0000FFFFU) << 16U)
| ((data >> 16U) & 0x0000FFFFU);
return data;
}
float get_float(const void *const dataptr, const unsigned char byte_order)
{
uint32_t data = *(const uint32_t *)dataptr;
if (byte_order & 1)
data = ((data & 0x00FF00FFU) << 8U)
| ((data >> 8U) & 0x00FF00FFU);
if (byte_order & 2)
data = ((data & 0x0000FFFFU) << 16U)
| ((data >> 16U) & 0x0000FFFFU);
return *(float *)&data;
}
Note that you can simply try all possible byte_order values (four for 32-bit values), until the received prototype matches the expected prototype value. If none of them work, then the sender did not use the same representation, and the communication would not work anyway.
I've found that for typical messages, the overhead incurred to adjust the byte order for received messages is minimal, truly neglible. One reason for that is that the data received tends to be cache-hot, and any operations done on them will usually not cause any cache-misses. The overhead tends to be just a few clock cycles per element.
The approach works very well even when messages are broadcast. The recipient only needs to check the sender to remember which byte order swizzle has to be done to the message. If the 2, 4, or 8-byte overhead per data type per message is not significant, you can even include the prototype value for each type in the message itself.
(Note that the situation is NOT symmetric. You cannot do the conversion on the sender side, and achieve the same flexibility. The sender would have to target each receiver separately. Since each message can have only one sender, but multiple recipients, the recipient can always adapt to the sender but not vice versa. So, this is only really feasible on the recipient side.)