I seriously doubt that the byte swapping will take longer than the conversion:
BBBBBB00 GGGGBBBB RRGGGGGG RRRRRRRR --> BBBBBBBB GGGGGGBB RRRRGGGG 00RRRRRR
An alternative is to shift the data on a machine whose native endianness matches the endianness of the data, or alter the code which generates the image data to generate it with an endianness which matches what the consumer expects.