![]() |
| | #1 |
| Registered User Join Date: Mar 2004
Posts: 36
| Unicode - a lot of confusion... pagecode, threads, text-based controls, local(global), wide-character, multi-byte characters can anyone please explain how these terms are dependent on each other ? my current understanding is that wide-character/multi-byte characters are schemas for storing character codes and pagecode maps some specific code to character glyph(font entry)... i have a lot of confusion on the subject, i know how to make my program unicode-aware but actually get it all work together is pretty hard... i searched and read a lot of info on unicode subject but no source is actually explains how unicode entities are interconnected... here is just a few questions i have: ---------------------------------------- How local and pagecode related each to other? -------------------------------------------------------------------------- say, i have a text-based control and documentation says that it uses a local to output text OR it uses some pagecode for symbol translation... so if i want to print each line of text in different language how is that possible? pagecode can represent simulteneously only 3 languages and local only 2.... ---------------------------------------------------------------------------- i read also that i can setup a different local for each thread?? are threads locale-aware? and what's it good for? ----------------------------------------------------------------------------- another, as i see the win32 unicode app can treat symbols as wide-character(two-byte character) or as multi-bytes character... i know what is a multi-byte character encoding (UTF-8, UTF-7) but is the above case refers to how characters are stored in memory? ------------------------------------------------------------------------ more about pagecodes and local, which win32 objects are depend on pagecode and/or local and is this means that these object are limited to simultenouesly handling only 3 or 2 types of glyphes(languages)? ------------------------------------------------------------------------ for example: i want to write a program that reads a unicode file with a few line in different language, say UTF-8. now, i know that there are routines which are local based and unicode based. as i see it, if i use a locale aware routine then the file won't be read properly because the local translates character codes according to some pagecode(global?) and it will be mapped to wrong glyphs... so i must use general unicode i/o functions... so what are local dependant function good for? ------------------------------------------------------------------------------ i wrote a lot and maybe some questions are not formulated well but i hope someone can put the things on their place thanks |
| Jumper is offline | |
| | #2 | ||||
| Yes, my avatar is stolen Join Date: Dec 2002
Posts: 2,544
| Here goes... Quote:
Quote:
Unicode aims to be a universal character set that contains characters for every script. Therefore it is not dependent on codepages. The value 156 maps to the same character wheter your computer is set up for Chinese or Portugese. A locale holds various language and format settings. For example consider we want today's date in short format. If we pass a US locale to GetDateFormat() we will get: 6-30-04 However, if we pass a UK locale to GetDateFormat() we will get: 30-6-04 You can see the Locale Information page for more details on what a windows locale can control. A locale may also hold details on the currently used codepage. As well as Windows locales there is also C locales. Quote:
Quote:
- Read the UTF-8 into a LPSTR. - Convert to unicode using MultiByteToWideChar(CP_UTF8, ...); - Use the resulting unicode string. Sample code to add a UTF8 string to a list box. Code: void AddUTF8ToListBox(HWND hwndListBox, LPCSTR szUTF8)
{
WCHAR szW[1024];
if (0 != MultiByteToWideChar(CP_UTF8, 0, szUTF8, -1, szW, 1024))
{
SendMessageW(hwndListBox, LB_ADDSTRING, 0, (LPARAM) szW);
}
}
Last edited by anonytmouse; 06-29-2004 at 10:00 PM. | ||||
| anonytmouse is offline | |
| | #3 |
| Registered User Join Date: Mar 2004
Posts: 36
| ok, thanks ![]() i think, that's clears up the matters one more question i have is: how many codepages the proccess(app) can have? is there a something called global codepage which applies through all code and all threads? Last edited by Jumper; 06-30-2004 at 03:27 AM. |
| Jumper is offline | |
| | #4 |
| Yes, my avatar is stolen Join Date: Dec 2002
Posts: 2,544
| >> how many codepages the proccess(app) can have? << I'm not sure what you mean here. Could you elaborate? >> is there a something called global codepage which applies through all code and all threads? << Yes, this is sometimes called the ansi code page. For example when you call: Code: SendMessageA(hwndListBox, LB_ADDSTRING, 0, (LPARAM) "my_string"); For applications that must support input in several different code pages(also called char sets), such as a web browser or email client, the input should be read in and then converted to unicode using MultiByteToWideChar(). If an application needs to provide non-unicode output it should use WideCharToMultiByte(). Either way, a modern multi-lingual windows application should do all internal work in unicode and only convert, if needed, on input and output. For example, with a page that specifies a character set of: Code: <META http-equiv="Content-Type" content="text/html; charset=EUC-JP"> Code: MultiByteToWideChar(51932, ...); |
| anonytmouse is offline | |
| | #5 |
| Registered User Join Date: Mar 2004
Posts: 36
| ok... another... [1] is it true that if a compile a unicode-ware app and run it, the system will load say ISO 10646(aka Unicode UCS-2 Little-Endian ) as default global codepage that will be used by all controls of my app? [2] and another one related to codepages and fonts... supose system loads ISO 10646 as a codepage for my app and i use some font, say in textbox, to write text. as i see it, i can't use any font i like? right? (because not every font contains every character defined by above codepage) [3] There is also a distinction between input codepage and output codepage? Last edited by Jumper; 07-01-2004 at 03:04 AM. |
| Jumper is offline | |
| | #6 |
| Yes, my avatar is stolen Join Date: Dec 2002
Posts: 2,544
| >> is it true that if a compile a unicode-ware app and run it, the system will load say ISO 10646(aka Unicode UCS-2 Little-Endian ) as default global codepage that will be used by all controls of my app? << No, not really. Nearly all functions that accept strings are split into two versions. An A version and a W version. The W version accepts unicode strings and the A version accepts strings in the default ansi code page. The unadorned version maps to the A or W version depending on whether unicode is defined. Code: #ifdef UNICODE #define SetConsoleTitle SetConsoleTitleW #else #define SetConsoleTitle SetConsoleTitleA #endif Code: SetConsoleTitle(TEXT("app"));
// becomes if UNICODE is defined:
SetConsoleTitleW(L"app");
// otherwise becomes
SetConsoleTitleA("app");
Therefore, if UNICODE is defined, you will be passing unicode strings to your controls because you will be implicitly using functions like SetWindowTextW and SendMessageW. However, the default ansi code page will not have changed. >>supose system loads ISO 10646 as a codepage for my app and i use some font, say in textbox, to write text. as i see it, i can't use any font i like? right? (because not every font contains every character defined by above codepage) << I'm not sure. There is certainly methods to create a combined font if a single one is not enough, whether the windows controls use this, I am not sure. You may need specialist help. I'd suggest microsoft.public.win32.programmer.international. |
| anonytmouse is offline | |
| | #7 | |
| Registered User Join Date: Mar 2004
Posts: 36
| Quote:
if i compile a unicode app does this mean that the system will load a codepade (like ISO 10646) for my app? (if my app will continue to use ansi codepage then it is only possible to support 2 languages) | |
| Jumper is offline | |
| | #8 |
| Yes, my avatar is stolen Join Date: Dec 2002
Posts: 2,544
| I'm not sure what you are saying. Unicode can hold characters for all languages. It is code page independent. |
| anonytmouse is offline | |
| | #9 |
| Registered User Join Date: Mar 2004
Posts: 36
| you said before that system loads ansi codepage for app. if my app is unicode-app then what codepage will sytem load(i thought it must be something like ISO 10646)? i hope it is clearer... |
| Jumper is offline | |
| | #10 |
| Yes, my avatar is stolen Join Date: Dec 2002
Posts: 2,544
| We seem to be going around in circles. >> if my app is unicode-app then what codepage will sytem load(i thought it must be something like ISO 10646)? << No. The default ansi code page will not change whatever app you load. There is no hard distinction between a unicode app and a non-unicode app. The unicode app simply uses the W functions while a non-unicode app uses the A functions. Some apps use a mixture. Again, for a program that deals only in unicode, the ansi code page is irrelevant. this site may be helpful: http://www.microsoft.com/globaldev/DrIntl/default.mspx |
| anonytmouse is offline | |
| | #11 | |
| Registered User Join Date: Mar 2004
Posts: 36
| ok thanks for the link... i experimenting now with console programming and i have a question related to this thread subject. i use ReadConsoleInput() function to read keyboard events. KEY_EVENT_RECORD structure contains two fields: wVirtualKeyCode and wVirtualScanCode. documentation explains these fields: Quote:
of device-independent manner) [EDIT] i looked at this site and there is no explanations there to the kind of things i don't know Last edited by Jumper; 07-02-2004 at 09:48 AM. | |
| Jumper is offline | |
| | #12 |
| Registered User Join Date: Mar 2004
Posts: 36
| anonytmouse, thanks for the link you gave me... now i reading some article which is turned to be very usefull here is the link to this article if someone else also struggles with unicode ![]() http://www.microsoft.com/globaldev/g...s/wrguide.mspx Last edited by Jumper; 07-05-2004 at 08:03 AM. |
| Jumper is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Unicode in C | Banana Man | C Programming | 5 | 01-05-2008 01:41 PM |
| Unicode and Multibyte | hollowlife1987 | Windows Programming | 0 | 07-11-2004 05:34 AM |
| Unicode & Good Program Design :: C++ | kuphryn | Windows Programming | 2 | 08-05-2002 04:09 PM |
| UNICODE and GET_STATE | Registered | C++ Programming | 1 | 07-15-2002 03:23 PM |
| Program chews up a lot of CPU when it closes | bman1176 | Windows Programming | 4 | 01-10-2002 11:23 AM |