![]() |
| | #1 |
| Registered User Join Date: May 2008
Posts: 4
| I'm writing myself a little GTK app and I'd like to have proper internationalization/localization support. I've done a lot of reading on the subject but a couple things still baffle me: I've decided to default to UTF-8 which seems to be the norm for most programs now. I understand that UTF-8 is encoded and works with standard char, however, I've also read that I should be using wchar_t throughout my program instead in case I need to change to UTF-16 or other encodings. Is this true, or should I simply use char? It's my understanding that wchar_t both increases the memory usage of a program rather dramatically and is also platform dependent (2 bytes in win32, 4 in linux, etc) so that's my most biting question... Should I be using wchar_t or std::wstring? What really is the difference between char/std::string and wchar_t/std::wstring besides basic encaspulation? Does it merely exist to make die hard C++ people warm and fuzzy or is there a real technical reason why the [w]string class is superior to simple [w]char arrays? (Sorry, I come from a C background) How does encoded (UTF-8,UTF-16,etc) data work across a network? Is it basically the same as using char or are there "weird" things I need to know/look out for? Along those lines, how does data look to a human if saved to a file? I'd like people in the US to be able to see the data in a regular text editor, but I'm afraid if everything is UTF-8 encoded, it'll look like garbly-gook.... Sorry for all the questions and I really appreciate any insight you can give me. j |
| samblack is offline | |
| | #2 | |||||
| Registered User Join Date: Apr 2006
Posts: 1,193
| Quote:
wchar_t is meant to provide a multi-byte character type, so that each character is more than one byte. This makes fixed width international encoding possible. Quote:
Quote:
Quote:
Quote:
__________________ It is too clear and so it is hard to see. A dunce once searched for fire with a lighted lantern. Had he known what fire was, He could have cooked his rice much sooner. | |||||
| King Mir is offline | |
| | #3 | ||||
| Registered User Join Date: May 2008
Posts: 4
| Quote:
Quote:
Quote:
Quote:
![]() I guess my problem here is, for example, in ASCII I can count on things like "if I find a \r\n then I know the server is done sending information", which I don't know if I can do with UTF-8 encoded data (or even how to input it into a String() buffer)... | ||||
| samblack is offline | |
| | #4 |
| Registered User Join Date: May 2008
Posts: 4
| I'm sorry, I forgot to thank you for your reply. It was very educational! |
| samblack is offline | |
| | #5 |
| Registered User Join Date: Mar 2003
Posts: 3,844
| What you have to remember is that UTF-8 is an encoding of a sequence of Unicode characters. UTF-16LE is an encoding of Unicode characters. Unicode is a character set, and UTF-8 and UTF-16LE are ways to encode a given seqeunce of Unicode characters. So if a spoken language can be represented in Unicode, then it can be encoded using either method. So - if you support UTF-8, then you support Unicode (using a UTF-8 encoding). Good info: http://www.cl.cam.ac.uk/~mgk25/unicode.html http://www.i18nguy.com/unicode/c-unicode.html MS targeted, but very informative. Has a table of file BOM's. Main site is nice too. gg |
| Codeplug is offline | |
| | #6 | ||||
| Registered User Join Date: Apr 2006
Posts: 1,193
| Quote:
Quote:
It is also necessary for printing and reading: Standard libraries will have a specific encoding for each char type - generally extended ascii for char and UTF-16 or UTF-32 for wchar_t. So if you're trying to print or read the characters from standard input/output, you need wchar_t. Quote:
1) Have a fixed may size. std::string will do away with this restriction. In this case it's not "just convenience", because by using std::string, you are adding features to your code. 2) Manually resize the array if it gets to big. The problem with this is that you are making what should be a simple task -- reading a string from wherever -- into a multi-line conglomerate. std::string is a way of hiding that conglomerate so that your code can be easy to read, but will still have the features you want. This means that your code will still say "read a string of any amount of characters from the user", but it will do so in one line, not several. This isn't convenience, it's code readability. 3) You could write your own functions that will read a block of data, and return the variable size array. This is a good solution, except that it's basically what std::string does for you. This is convenience, but it also means that you know the code work, without testing. Quote:
__________________ It is too clear and so it is hard to see. A dunce once searched for fire with a lighted lantern. Had he known what fire was, He could have cooked his rice much sooner. Last edited by King Mir; 05-09-2008 at 07:03 PM. | ||||
| King Mir is offline | |
![]() |
| Tags |
| i18n, l10n, std::wstring, wchar_t |
| Thread Tools | |
| Display Modes | |
|