Well, I just discovered that for the novice implementing Unicode I/O in C++ is as fun as having your teeth pulled out.
After scouring the web for hours, I found how to do it right here:
http://cboard.cprogramming.com/showt...hlight=wchar_t
Code:
wchar_t BOM = 0xFEFF;
What I don't understand about the solution is why the byte order markers are inverted (i.e. why it is not 0xFFFE), and why when it is written, it is flipped into the normal byte order.
Inferring from this, I thought all wide characters are flipped when written or read. So if a representation of 'b' in a text file might be 0x6200, I would have to manually flip it to become 0x0062 when concatenating it to my internal wstring.
Well, I was wrong.
So what is it? Is unicode inherently big or little endian (I'd assumed it was the former)? And can I go on using code like the following (which would mean unicode is little endian):
Code:
wstring wstr;
char ch[2];
while(ifilestream)
{
ifilestream.read(ch, 2);
wstr += *ch;
}