Thread: Unicode and ListView control

  1. #1
    Registered User
    Join Date
    Jul 2004
    Posts
    17

    Unicode and ListView control

    Hi, I was wondering how I can get my List view control in Report mode, work with utf-8, everything else works fine it's just that the multibyte chars aren't displaying correctly, like Ω shows up as IŠ or something very similar.

  2. #2
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    UTF-8 is a character encoding scheme for "wide" character sets. In the US, the Windows GUI and shell uses the ANSI character set (code page 1252). And by default, Windows console applications use the extended ASCII character set (code page 437).

    As far as GDI is concernced, it will take the given character code and display the cooresponding glyph of the selected font. Use the "character map" application to determine which font and which character set you really want to use.
    If you want to display glyphs from the ASCII character set, most fonts do support that character set (OEM_CHARSET in the US or simply 437).

    If you have a Unicode stream that's UTF-8 encoded, you can try converting it to wide character Unicode string using MultiByteToWideChar() with CP_UTF8.

    Since you're trying to say display "Ω", I'm betting that you're trying to display ASCII characters, but getting ANSI or Unicode characters.

    gg

  3. #3
    Registered User
    Join Date
    Jul 2004
    Posts
    17
    I tryed using widechar's but I don't think the list view control supports it since it only showed the first char. Is there anyway to change the code page used for the list view control or any other ideas how I can fix this?

  4. #4
    the hat of redundancy hat nvoigt's Avatar
    Join Date
    Aug 2001
    Location
    Hannover, Germany
    Posts
    3,130
    Do you have another control that displays this glyph properly ? A textbox for example ? As far as I know, the standard for controls is either ANSI or UTF16.
    hth
    -nv

    She was so Blonde, she spent 20 minutes looking at the orange juice can because it said "Concentrate."

    When in doubt, read the FAQ.
    Then ask a smart question.

  5. #5
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> Is there anyway to change the code page used for the list view control...
    Like I said, GDI simply draws the glyph specified by the selected font. If you want to change the font of any control, you send it the WM_SETFONT message. When you create a font, you specify what character set to use.
    The LOGFONT structure contains an lfCharSet member that you can set to OEM_CHARSET or 473 for the ASCII character set.

    gg

  6. #6
    Registered User
    Join Date
    Jul 2004
    Posts
    17
    I tryed changing the lfCharSet as you said, and that didn't work, I use this for my Richedit control, which also uses the same data as I'm trying to display in my List View control and it works fine with EM_SETEXTEX when I use the 65001 code page or CP_UTF8 and it works fine, I use this code to make the font for my Richedit control, which works fine.

    Code:
    hf = CreateFont(14, 0, 0, 0, 0, FALSE, FALSE, 0, 0, 0, 0, 0, 0, "Arial");
    I tryed using OEM_CHARSET like Codeplug said, and that didn't work.

  7. #7
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    If you can zip up and post a small project that demonstrates the problem, I'll take a look at it.

    gg

  8. #8
    Registered User
    Join Date
    Jul 2004
    Posts
    17
    Ok, I made the small project, I basicly just striped everything down from the program that I'm having problems with, sorry about the messy code. I used this string in my tests "TestΩ"

  9. #9
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> The LOGFONT structure contains an lfCharSet member that you can set to OEM_CHARSET or 473 for the ASCII character set.
    I gave you some bad information. I confused charactersets with code pages and didn't verify the facts before posting........my bad

    The CP_OEMCP code page uses the traditional IBM or DOS character set. Character sets when creating fonts are about specifying what glyphs within the font are needed for a particular language or set of languages.

    A lot of functions in the Windows API have two versions - a Unicode and an ANSI version. For example, CreateWindowEx() is simply a macro that calls either CreateWindowExA() for the ANSI version, or CreateWindowExW() for the Unicode version.
    When you use the ANSI API's, Windows will use the ANSI code page. When you use the Unicode API's, Windows will use the Unicode code page. The MultiByteToWideChar() and WideCharToMultiByte() functions can be used to convert to and from Unicode, using any code page installed on the system.

    Here's what happens in your application. "TestΩ" is placed into the first RichEdit via the clipboard as Unicode text. The RichEdit control accepts the text even though it was created using ANSI API's. The code then gets the text using the EM_GETTEXTEX
    message while specifying CP_UTF8. This will return a UTF8 encoded string. When the code sets the text in the second RichEdit, the EM_SETTEXTEX message is used while specifying CP_UTF8. This correctly decodes it back into the Unicode that Windows uses. The ListView control was created using ANSI API's so all text is displayed and interpreted using the ANSI code page.

    Here are two options for fixing it.
    - Use ANSI code page only. That means just use non-EX versions for getting and setting text.
    - Use Unicode code page only. That means just use non-EX versions for getting and setting text. Build you source with "UNICODE" and "_UNICODE" defined. Put 'L' in front of all your string literals and use WCHAR or wchar_t instead of char for strings.

    As a side note, you can use the "Terminal" font to force the use the traditional DOS character set.

    gg

  10. #10
    Registered User
    Join Date
    Jul 2004
    Posts
    17
    I was hoping to avoid the "UNICODE" and "_UNICODE defines. And since my program needs to support unicode I can't just drop uniocde support. I hope I'm trying not to do the impossible keeping 9x support without MSLU while keeping unicode support.

  11. #11
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> 9x support without MSLU while keeping unicode support
    I don't believe there is such thing...

    Why don't you want to use the MSLU?

    gg

  12. #12
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981

  13. #13
    Registered User
    Join Date
    Jul 2004
    Posts
    17
    *sigh* Oh well guess it's the only way

    EDIT:
    unless anyone else can think of a way I can get the chars to display correctly in my list box
    Last edited by hollowlife1987; 08-16-2004 at 11:35 PM.

  14. #14
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >>And since my program needs to support unicode
    Why do you want your program to support unicode?

    gg

  15. #15
    Registered User
    Join Date
    Jul 2004
    Posts
    17
    It's a chat client for a p2p network, and you know how people like making their name look cool, the real chat client that the p2p uses allows unicode in usernames, so naturally I have to do the same, for that reason.
    ( I asked the devloper of the p2p network before I did any major work with my program)

Popular pages Recent additions subscribe to a feed