Thread: ASCII and "big endian"

  1. #1
    Madly in anger with you
    Join Date
    Nov 2005
    Posts
    211

    ASCII and "big endian"

    okay, so this doesn't really have much to do with C, however I don't know where else to ask it. if moderators feel that this is an inappropriate spot to post this, feel free to move it, and sorry for the inconvience.

    I'm trying to write the characters FDG into an int using their hex values. so far, this is what I've established: 0x00464447.

    however I'm a little confused here. in the Windows PE file format's IMAGE_DOS_HEADER, there is the member e_magic (a WORD, or in native C, a short) which should be IMAGE_DOS_SIGNATURE for a valid executable file. IMAGE_DOS_SIGNATURE is defined like so:

    Code:
    #define IMAGE_DOS_SIGNATURE 0x5A4D
    this is the very first value in any executable file, which from a hex editor is visible as the ASCII characters MZ. what I don't understand here, is that if the hex value for the ASCII character 'M' is 0x4D, and for 'Z' it is 0x5A, wouldn't that be ZM?

    after doing a little research, I found something about big endian, and how it affects the order in which bytes are represented. I read a little about it at wikipedia but got lost in some complex explanations and equations.

    is 0x00464447 what I'm looking for here for FDG, or should it be 0x00474446? or neither of them, and if this is the case, could someone please explain why?


    thank you in advance.

  2. #2
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Endianness is an observable effect of multi-byte* storage. So it depends on what exactly it is you are storing into. If you only look at single bytes, it always looks the same.
    [*]Yes, I know what byte means.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  3. #3
    Madly in anger with you
    Join Date
    Nov 2005
    Posts
    211
    okay, after some more thorough reading, it looks like neither of my attempts were correct, and that it should really look like: 0x47444600.

    I tested it and it looks as it should, I'm storing it into a 32-bit value (unsigned long). Thanks Dave, and sorry to mods for posting without performing significant research.

  4. #4
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    I'm not sure why you're storing characters in a short int, but if you're sending integers over the network, you should convert them to/from network byte order using htons()/ntohs() & htonl()/ntohl()...

  5. #5
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    On x86 systems, byte order is always swapped in memory.
    A 1 byte value of 0x12 is stored as 0x12 is memory.
    A 2 byte value of 0x1234 is stored as 0x34 0x12 is memory.
    A 4 byte value of 0x12345678 is stored as 0x78 0x56 0x34 0x12 in memory
    Usually, since structs are read and written to disk in its raw memory form, it also usually means it's byteswapped. But is this what you're after?
    Last edited by Elysia; 11-10-2007 at 09:40 PM.

  6. #6
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by Elysia View Post
    On x86 systems, byte order is always swapped in memory.
    [...]
    A 4 byte value of 0x12345678 is stored as 0x56 0x78 0x12 0x34 in memory (word swapped).
    Usually, since structs are read and written to disk in its raw memory form, it also usually means it's byteswapped. But is this what you're after?
    Can you demonstrate this? My initial attempt failed.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  7. #7
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Dave_Sinkula View Post
    Can you demonstrate this? My initial attempt failed.
    Because it is untrue.

  8. #8
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Alright, so it seems even 4 bytes are swapped per byte. I remembered wrong >_<

    Code:
    int main()
    {
    	int nNum1 = 0x12;
    	int nNum2 = 0x1234;
    	int nNum4 = 0x12345678;
    	char* pNum1 = (char*)&nNum1;
    	char* pNum2 = (char*)&nNum2;
    	char* pNum4 = (char*)&nNum4;
    
    	fprintf(stdout, "Number 1 (1 byte): 0x&#37;X\n", pNum1[0]);
    	fprintf(stdout, "Number 2 (2 bytes): 0x%X%X\n", pNum2[0], pNum2[1]);
    	fprintf(stdout, "Number 3 (4 bytes): 0x%X%X%X%X\n", pNum4[0], pNum4[1], pNum4[2], pNum4[3]);
    	return 0;
    }
    OUTPUT:
    Number 1 (1 byte): 0x12
    Number 2 (2 bytes): 0x3412
    Number 3 (4 bytes): 0x78563412
    Last edited by Elysia; 11-10-2007 at 09:36 PM.

  9. #9
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Elysia View Post
    On x86 systems, byte order is always swapped in memory.
    I don't think "byte swapping" is a good way to explain endianness. The bytes aren't "swapped" for no reason, it is simply adherence to a principal that the most significant byte comes first (or last) in memory.

    Your example of a 4-byte quantity on x86 is incorrect.

  10. #10
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by brewbuck View Post
    I don't think "byte swapping" is a good way to explain endianness. The bytes aren't "swapped" for no reason, it is simply adherence to a principal that the most significant byte comes first (or last) in memory.
    True enough, but you would expect a value to be 0x1234 to be stored as 0x12 0x34 in memory, which isn't true on x86.

    Your example of a 4-byte quantity on x86 is incorrect.
    I just saw that >_<

  11. #11
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Elysia View Post
    On x86 systems, byte order is always swapped in memory.
    A 4 byte value of 0x12345678 is stored as 0x56 0x78 0x12 0x34 in memory (word swapped).
    Usually, since structs are read and written to disk in its raw memory form, it also usually means it's byteswapped. But is this what you're after?
    Not quite, it's 0x78 0x56 0x34 0x12.
    All the bytes are always in reverse order. That even goes for 64-bit types ala __int64;
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  12. #12
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Yep, everyone is picking on me today
    But I already demonstrated I was wrong and edited the first post to match.

  13. #13
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Elysia View Post
    True enough, but you would expect a value to be 0x1234 to be stored as 0x12 0x34 in memory, which isn't true on x86.
    Why would you expect any particular storage at all? The truth is, when you look at the data in a hex editor, the lower addresses are usually shown leftmost and topmost -- standard reading order. So the value 0x1234 will appear as the two bytes 12 34 only if the more significant byte is stored at the lower address, a.k.a. the "first" address.

    But there is no logical reason why we should consider lower addresses to come "before" higher addresses. It's just a human convention. Big-endian order means values "read" right -- unless you happen to read a language where the letters go right to left! Or, imagine a hex editor where the lowest address is displayed at the lower right, and increasing addresses go leftward and upward on the screen. There's nothing "wrong" with such an editor, and it would cause little-endian values to "read correctly" on the screen.

    This idea of "left is first" is so ingrained that we don't realize that it's completely artificial.

    Intel chips have traditionally been little-endian. This means values read "backward" in a hex editor, but this doesn't mean anything. In a way, it's more logical than big-endian ordering since the LEAST significant byte occurs at the LOWEST memory address. Small-to-small. Again, all totally arbitrary.

  14. #14
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    When I TYPE the value left to right, then I EXPECT it to be stored left to right in memory too. This is what I'm hinting at. It's not wrong. It's just unexpected to most of us.

  15. #15
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Elysia View Post
    When I TYPE the value left to right, then I EXPECT it to be stored left to right in memory too. This is what I'm hinting at. It's not wrong. It's just unexpected to most of us.
    What makes you think that lower addresses are "to the left of" higher addresses?

    EDIT: And I'd say that the opposite is expected by most of us, because the majority of chips and data formats out there are BIG endian, not little endian.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Explain this whole "Big Bang" thing to me
    By Govtcheez in forum A Brief History of Cprogramming.com
    Replies: 65
    Last Post: 01-14-2005, 12:37 PM
  2. Exception to the "Big 3"?
    By Cat in forum C++ Programming
    Replies: 4
    Last Post: 06-08-2003, 09:08 AM