Thread: How is REAL Unicode string included and displayed in a C program?

  1. #1
    Registered User
    Join Date
    Oct 2005
    Posts
    42

    How is REAL Unicode string included and displayed in a C program?

    Hello everyone,

    We read a lot of about writing Unicode ready program since Charles Petzold’s Programming Windows, 5th Edition. But how is a REAL Unicode string included in a c program and displayed correctly?

    Below is the program from the book. It’s supposed it’s a Unicode ready program when it’s compiled with _UNICODE defined.

    Now I want to insert two real Unicode characters 天大 into the string

    TEXT ("Hello, Windows 98!").

    I copy them from MS Character Map. Their Unicodes are U+5929 U+5927. The string now looks

    TEXT ("Hello ??, Windows 98!")

    or

    TEXT ("Hello \x5929\x5927, Windows 98!")

    in Visual Studio 2003. When it’s compiled and run, the string is displayed:


    Hello??, Windows 98!

    OR

    HelloII, Windows 98!

    The problem here is how one-byte ASCII characters and two-byte Unicodes are mixxed in to a TEXT(...) string and the compiler can correctly identify them, and do a properly translation into an internal Unicode string. The second issue is the default code page is Unicode UTF-16. This code page is used to dispaly the string. Is any body here know how MS VC team handle these issues?

    So, how a REAL Unicode string is included and displayed correctly in a VC++ program?

    Use BOM? Where is BOM put and how?

    Can any body modify the following codes to make it displays correctly?



    Thanks,

    Code:
     
    /*------------------------------------------------------------
    
       HELLOWIN.C -- Displays "Hello, Windows 98!" in client area
    
                     (c) Charles Petzold, 1998
    
      ------------------------------------------------------------*/
    
     
    
    #include <windows.h>
    
     
    
    LRESULT CALLBACK WndProc (HWND, UINT, WPARAM, LPARAM) ;
    
     
    
    int WINAPI WinMain (HINSTANCE hInstance, HINSTANCE hPrevInstance,
    
                        PSTR szCmdLine, int iCmdShow)
    
    {
    
         static TCHAR szAppName[] = TEXT ("HelloWin") ;
         HWND         hwnd ;
         MSG          msg ;
         WNDCLASS     wndclass ;
    
         wndclass.style         = CS_HREDRAW | CS_VREDRAW ;
         wndclass.lpfnWndProc   = WndProc ;
    
         wndclass.cbClsExtra    = 0 ;
         wndclass.cbWndExtra    = 0 ;
         wndclass.hInstance     = hInstance ;
         wndclass.hIcon         = LoadIcon (NULL, IDI_APPLICATION) ;
         wndclass.hCursor       = LoadCursor (NULL, IDC_ARROW) ;
         wndclass.hbrBackground = (HBRUSH) GetStockObject (WHITE_BRUSH) ;
         wndclass.lpszMenuName  = NULL ;
         wndclass.lpszClassName = szAppName ;
    
         if (!RegisterClass (&wndclass))
         {
              MessageBox (NULL, TEXT ("This program requires Windows NT!"), 
                          szAppName, MB_ICONERROR) ;
              return 0 ;
         }
    
         hwnd = CreateWindow (szAppName,                  // window class name
                              TEXT ("The Hello Program"), // window caption
                              WS_OVERLAPPEDWINDOW,        // window style
                              CW_USEDEFAULT,              // initial x position
                              CW_USEDEFAULT,              // initial y position
                              CW_USEDEFAULT,              // initial x size
                              CW_USEDEFAULT,              // initial y size
                              NULL,                       // parent window handle
                              NULL,                       // window menu handle
                              hInstance,                  // program instance handle
                              NULL) ;                     // creation parameters
    
         ShowWindow (hwnd, iCmdShow) ;
         UpdateWindow (hwnd) ;
    
         while (GetMessage (&msg, NULL, 0, 0))
         {
              TranslateMessage (&msg) ;
              DispatchMessage (&msg) ;
         }
         return msg.wParam ;
    }
    
    LRESULT CALLBACK WndProc (HWND hwnd, UINT message, WPARAM wParam, LPARAM lParam)
    {
         HDC         hdc ;
         PAINTSTRUCT ps ;
         RECT        rect ;
    
         switch (message)
         {
         case WM_CREATE:
              PlaySound (TEXT ("hellowin.wav"), NULL, SND_FILENAME | SND_ASYNC) ;
              return 0 ;
    
         case WM_PAINT:
              hdc = BeginPaint (hwnd, &ps) ;
    
              GetClientRect (hwnd, &rect) ;
    
              DrawText (hdc, TEXT ("Hello\x5929\x5927, Windows 98!"), -1, &rect,
                        DT_SINGLELINE | DT_CENTER | DT_VCENTER) ;
    
              EndPaint (hwnd, &ps) ;
              return 0 ;
    
         case WM_DESTROY:
              PostQuitMessage (0) ;
              return 0 ;
         }
    
         return DefWindowProc (hwnd, message, wParam, lParam) ;
    }

  2. #2
    Registered User OnionKnight's Avatar
    Join Date
    Jan 2005
    Posts
    555
    There is a C99 escape sequence, \u####, that will do what you want but I guess MSVC++ doesn't have that. \x## is restricted to one byte, and even if it wasn't you'd have to convert your universal character to a UTF-16 sequence.
    Standard C is pretty lacking when it comes to newfangled stuff, somehow complex numbers are far more interesting than Unicode and threads.
    Other than \u I think it will get pretty painful. libiconv might be helpful unless there's a neat Windows API function that you can use.

  3. #3
    Registered User
    Join Date
    Oct 2005
    Posts
    42
    When the string TEXT ("Hello\x5929\x5927, Windows 98!") is changed to TEXT ("Hello\x29\x59\x27\x59, Windows 98!") or TEXT ("Hello\x29\x59\x27\x59, Windows 98!"). The program displays:

    Hello)Y'Y, Windows 98! or HelloY)Y', Windows 98!

    It treats them as the normal ASCII chars.

    For string TEXT ("Hello\u5929\u5927, Windows 98!"), it shows warning as

    warning C4129: 'u' : unrecognized character escape sequence

    So what is the solution?

    Thanks,

  4. #4
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Just load user-displayable strings from a string table resource. It's the proper way to do it anyway.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  5. #5
    Registered User OnionKnight's Avatar
    Join Date
    Jan 2005
    Posts
    555
    Quote Originally Posted by wow View Post
    When the string TEXT ("Hello\x5929\x5927, Windows 98!") is changed to TEXT ("Hello\x29\x59\x27\x59, Windows 98!") or TEXT ("Hello\x29\x59\x27\x59, Windows 98!"). The program displays:

    Hello)Y'Y, Windows 98! or HelloY)Y', Windows 98!

    It treats them as the normal ASCII chars.
    That happens because the compiler first makes an ASCII string with "Hello\x29\x59\x27\x59, Windows 98!", it then converts the string into a wide string and so the hexadecimal characters just like any other ASCII-character that needs to be converted to UTF-16, \x29\x59 becomes \x00\x29\x00\x59.
    The painful way I talked about would be to store a placeholding byte in the string, '?' is a good candidate, and then insert the Unicode character at runtime.
    Code:
    TCHAR str[] = TEXT("Hello??, Windows 98!");
    
    #ifdef _UNICODE
        str[5] = (0x29 << 8) | 0x59;
        str[6] = (0x27 << 8) | 0x59;
    #endif
    Now a ? will be displayed when compiled in ASCII-mode and with Unicode characters when compiled with _UNICODE.

    So yeah you'd probably want a string table instead.
    http://www.lischke-online.de/support...f40bcea4c34207

  6. #6
    Registered User
    Join Date
    Oct 2005
    Posts
    42
    Thanks for the solution. But the display is as below:

    HelloII, Windows 98!

    I guess the default CODE PAGE is not UNICODE. Can any one confirms this for Winodws 2000/XP/Vista?

    I know the _UNICODE is defined, because when the statement is as the following:

    DrawText (hdc, TEXT(str), -1, &rect,
    DT_SINGLELINE | DT_CENTER | DT_VCENTER),

    it the compiler shows you:

    error C2065: 'Lstr' : undeclared identifier
    warning C4047: 'function' : 'const unsigned short *' differs in levels of indirection from 'int '
    warning C4024: 'DrawTextW' : different types for formal and actual parameter 2
    Error executing cl.exe.
    Last edited by wow; 05-14-2007 at 08:07 AM.

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    I still recommend using a string table resource. Then it becomes:

    Code:
    const int BUFFER_SIZE = 1024;
    TCHAR buffer[BUFFER_SIZE];
    int ret = LoadString(GetModuleHandle(NULL), IDS_HELLO, buffer, BUFFER_SIZE);
    if(ret == 0) {
      // String resource not found, possibly not translated?
    } else {
      // buffer contains the string, in proper form.
    }
    The LoadString interface has one obvious drawback: there's no way to find out how big the string actually is - to ensure that you've really got it all, you have to repeatedly increase the buffer size, reload and check if the load count is smaller than bufferSize-1.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Math wizard
    Join Date
    Dec 2006
    Location
    USA
    Posts
    582
    Unicode is 16-bit - are you using the appropriate variable type? I'm not sure if this has anything to do with it, but char is 8-bit and Unicode is 16-bit.
    High elevation is the best elevation. The higher, the better the view!
    My computer: XP Pro SP3, 3.4 GHz i7-2600K CPU (OC'd to 4 GHz), 4 GB DDR3 RAM, X-Fi Platinum sound, GeForce 460, 1920x1440 resolution, 1250 GB HDD space, Visual C++ 2008 Express

  9. #9
    Registered User
    Join Date
    Oct 2005
    Posts
    42
    Hello there,

    The problem is settled now as the following:

    CharSet = DEFAULT_CHARSET
    Font = TEXT("MS Gothic")

    Code:
     TCHAR str[] = TEXT("HelloXX, Windows 98!");
    
    #ifdef UNICODE
         str[5] = 0x5929;
         str[6] = 0x5927;
        #endif

    Thanks, every one.

  10. #10
    Tropical Coder Darryl's Avatar
    Join Date
    Mar 2005
    Location
    Cayman Islands
    Posts
    503
    >>> DrawText (hdc, TEXT(str), -1, &rect,

    You only use the TEXT() macro on literal strings, not on variables: ex.TEXT("This is a literal string")

    For variables, you need to use something based on WCHAR which for Drawtext is a LPWSTR


    As far as drawing non-english characters, you also have to make sure that the font you are using has the characters defined
    Last edited by Darryl; 05-15-2007 at 03:33 PM.

  11. #11
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    The question is moot. UTF-8? UTF-16? Little endian? Big endian? Unicode is a character set, not an encoding. Asking "how do I use Unicode characters" doesn't ask a meaningful question.

  12. #12
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    UTF-16, traded in unsigned shorts and thus endian-less. It's what Windows uses.

    It's bad use of the term, but Unicode in the context of the WinAPI always means this.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  13. #13
    Registered User
    Join Date
    Oct 2005
    Posts
    42
    I fell bad: is this the way to code Unicode with VC++? Microsoft has advocated Unicode programming for ten years now. Is this the way to insert Unicode chars into a string?

    Thanks,

  14. #14
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    For the third time, no!

    You only need such symbols when you want localized, i.e. user-visible strings. And localized strings, as MS and every good programmer out there will tell you, should not be hardcoded into the source in the first place, but loaded from some resource file.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  15. #15
    Registered User
    Join Date
    Oct 2005
    Posts
    42
    Sure, you don't include these strings directly in a program, but put them in a resource file. But do you think the resource file can handle the mixing of ASCII chars and Unicode chars correctly? Try it!

    The issue there is exactly the same!!

    Thanks,

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Menu
    By Krush in forum C Programming
    Replies: 17
    Last Post: 09-01-2009, 02:34 AM