-
Unicode
Hello,
I want to use unicode in my app. By that I mean that it should be able to read and write to files in unicode. Obviously that means that I should also use it internally.
I've read the tutorial, and many others I found via google, but it only made me more confused :(
(assume I use windows nt or above, though it would be nice to have cross-platform support like linux or mac)
- what do I use to store data? wchar_t? The tutorial says that it's recommended to use something like an unsigned long... How do I store strings then? Create my own class, or will wstring do?
- File input output. Are there functions that detect the encoding nicely (utf-8, 16LE/BE, ...) or should I do it myself (BOM detection)?
- Once the encoding is detected, what functions exist to read to a string, and convert to the internal storage I chose in 1?
Also, if needed, some nice open source library is welcome if you know a good one.
Thanks a lot in advance...
-
1. If you wanted to make life simple from the development side, a proper instantiation of basic_string for a 32-bit data type, using UTF-32, would allow you to use all of the features of basic_string that you're familiar with. This does (according to the Unicode site) waste space however, but I'm not sure how well a wstring would work with UTF-16 (though on the surface it would appear to be a valid method).
2. Darn good question...but BOM detection seems to be the most surefire way (unless the libraries below handle the detection too).
3. http://www.unicode.org/onlinedat/products.html#3 Libraries (commercial and open source) for Unicode handling.
-
thx for the reply.
I've been writing some code, a bit for detecting BOM -> know the encoding. But before I write my own string class, what encoding does windows itself use in it's api's? utf16? if so, LE or BE?