Thread: Unicode

  1. #1
    Registered User
    Join Date
    Oct 2005
    Location
    Genk, Belgium
    Posts
    8

    Unicode

    Hello,

    I want to use unicode in my app. By that I mean that it should be able to read and write to files in unicode. Obviously that means that I should also use it internally.
    I've read the tutorial, and many others I found via google, but it only made me more confused
    (assume I use windows nt or above, though it would be nice to have cross-platform support like linux or mac)
    1. what do I use to store data? wchar_t? The tutorial says that it's recommended to use something like an unsigned long... How do I store strings then? Create my own class, or will wstring do?
    2. File input output. Are there functions that detect the encoding nicely (utf-8, 16LE/BE, ...) or should I do it myself (BOM detection)?
    3. Once the encoding is detected, what functions exist to read to a string, and convert to the internal storage I chose in 1?
    Also, if needed, some nice open source library is welcome if you know a good one.

    Thanks a lot in advance...

  2. #2
    Registered User
    Join Date
    Nov 2005
    Posts
    52
    1. If you wanted to make life simple from the development side, a proper instantiation of basic_string for a 32-bit data type, using UTF-32, would allow you to use all of the features of basic_string that you're familiar with. This does (according to the Unicode site) waste space however, but I'm not sure how well a wstring would work with UTF-16 (though on the surface it would appear to be a valid method).

    2. Darn good question...but BOM detection seems to be the most surefire way (unless the libraries below handle the detection too).

    3. http://www.unicode.org/onlinedat/products.html#3 Libraries (commercial and open source) for Unicode handling.

  3. #3
    Registered User
    Join Date
    Oct 2005
    Location
    Genk, Belgium
    Posts
    8
    thx for the reply.
    I've been writing some code, a bit for detecting BOM -> know the encoding. But before I write my own string class, what encoding does windows itself use in it's api's? utf16? if so, LE or BE?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. <string> to LPCSTR? Also, character encoding: UNICODE vs ?
    By Kurisu33 in forum C++ Programming
    Replies: 7
    Last Post: 10-09-2006, 12:48 AM
  2. Unicode - a lot of confusion...
    By Jumper in forum Windows Programming
    Replies: 11
    Last Post: 07-05-2004, 07:59 AM
  3. Should I go to unicode?
    By nickname_changed in forum C++ Programming
    Replies: 10
    Last Post: 10-13-2003, 11:37 AM
  4. printing non-ASCII characters (in unicode)
    By dbaryl in forum C Programming
    Replies: 1
    Last Post: 10-25-2002, 01:00 PM
  5. UNICODE and GET_STATE
    By Registered in forum C++ Programming
    Replies: 1
    Last Post: 07-15-2002, 03:23 PM