Thread: Unicode file I/O

  1. #1
    Registered User
    Join Date
    Oct 2005
    Posts
    271

    Unicode file I/O

    Well, I just discovered that for the novice implementing Unicode I/O in C++ is as fun as having your teeth pulled out.

    After scouring the web for hours, I found how to do it right here:

    http://cboard.cprogramming.com/showt...hlight=wchar_t

    Code:
    wchar_t BOM = 0xFEFF;
    What I don't understand about the solution is why the byte order markers are inverted (i.e. why it is not 0xFFFE), and why when it is written, it is flipped into the normal byte order.

    Inferring from this, I thought all wide characters are flipped when written or read. So if a representation of 'b' in a text file might be 0x6200, I would have to manually flip it to become 0x0062 when concatenating it to my internal wstring.
    Well, I was wrong.

    So what is it? Is unicode inherently big or little endian (I'd assumed it was the former)? And can I go on using code like the following (which would mean unicode is little endian):
    Code:
    wstring wstr;
    char ch[2];
    while(ifilestream)
    {
    	ifilestream.read(ch, 2);
    	wstr += *ch;
    }

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > And can I go on using code like the following (which would mean unicode is little endian):
    I would suggest you use the wide character types.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Newbie homework help
    By fossage in forum C Programming
    Replies: 3
    Last Post: 04-30-2009, 04:27 PM
  2. File transfer- the file sometimes not full transferred
    By shu_fei86 in forum C# Programming
    Replies: 13
    Last Post: 03-13-2009, 12:44 PM
  3. Subtle(?) File I/O Problem
    By cecomp64 in forum C Programming
    Replies: 9
    Last Post: 07-16-2008, 11:39 AM
  4. Unicode File I/O
    By mercury529 in forum C++ Programming
    Replies: 6
    Last Post: 11-26-2005, 12:51 PM
  5. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM