Thread: Reading a language file (UTF-8 to wstring)

  1. #1
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262

    Reading a language file (UTF-8 to wstring)

    Earlier, I asked a question about i18n. Thanks to those who answered (I could post a thanks there, but I don't want to bump the question unnecessarily). However, I have another question about the same subject.

    For the program I'm making, I want to have a single file (let's say encoded in UTF-8, could be anything but I prefer UTF-8) for translations of certain sentences. Now, how can I best input that into a wchar_t?
    As far as I know, I can't simply use a wfstream, since the format in which it reads isn't defined by the standard - could be unicode or UTF8. Also, I can't find a way to portably change it to UTF8. Besides, I tested something like this: "wstring s; while(wcin>>s);". It would run the while loop as long as I typed ASCII characters; the moment I typed, for instance a euro sign, it quit the loop. While my terminal does seem to support the euro sign.
    I could read it in byte after byte and automatically decode the UTF-8 to unicode. Done that before, it's really easy, but then it's probably not portable to add characters to the wstring. The wstring, as I understood, can be stored as both UTF-16 (16-bit wchar_t in Windows) or UTF-32 (in Linux). Maybe even UTF-8 in other architectures.

    So, what is a *portable* way to input a UTF-8 file into a wide string?


    Thanks in advance,
    EVOEx

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    This looks like something that would correspond well to what you want. I don't know of any better source, but I'd be surprised if there isn't one out there.

    http://www.codeproject.com/KB/string/UtfConverter.aspx

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Quote Originally Posted by matsp View Post
    This looks like something that would correspond well to what you want. I don't know of any better source, but I'd be surprised if there isn't one out there.

    http://www.codeproject.com/KB/string/UtfConverter.aspx

    --
    Mats
    Thanks a lot for your reply. It doesn't look too portable, making a distinction for the size of wchar_t, but if this is as portable as it's going to get I can live with it.
    I do get the impression nationalization is, in every aspect, really insufficient as of yet. The documentation I read about it is inconsistent and unportable. But I'll guess we'll have to wait for C++0x for it to get better...

    Thanks,
    EVOEx

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. opening empty file causes access violation
    By trevordunstan in forum C Programming
    Replies: 10
    Last Post: 10-21-2008, 11:19 PM
  2. Problem reading file
    By coder_009 in forum C Programming
    Replies: 10
    Last Post: 01-15-2008, 01:22 PM
  3. Game Pointer Trouble?
    By Drahcir in forum C Programming
    Replies: 8
    Last Post: 02-04-2006, 02:53 AM
  4. what does this mean to you?
    By pkananen in forum C++ Programming
    Replies: 8
    Last Post: 02-04-2002, 03:58 PM

Tags for this Thread