Well, I just discovered that for the novice implementing Unicode I/O in C++ is as fun as having your teeth pulled out.
After scouring the web for hours, I found how to do it right here:
Unicode File I/O
What I don't understand about the solution is why the byte order markers are inverted (i.e. why it is not 0xFFFE), and why when it is written, it is flipped into the normal byte order.Code:wchar_t BOM = 0xFEFF;
Inferring from this, I thought all wide characters are flipped when written or read. So if a representation of 'b' in a text file might be 0x6200, I would have to manually flip it to become 0x0062 when concatenating it to my internal wstring.
Well, I was wrong.
So what is it? Is unicode inherently big or little endian (I'd assumed it was the former)? And can I go on using code like the following (which would mean unicode is little endian):
Code:wstring wstr; char ch[2]; while(ifilestream) { ifilestream.read(ch, 2); wstr += *ch; }



LinkBack URL
About LinkBacks


