Thread: Unicode... Am I missing something here?

  1. #1
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547

    Unicode... Am I missing something here?

    I must be missing something... A #define or a header... something...

    In windows all functions respond to the UNICODE define, giving you the wide string version of the call... but they didn't think to do this in VC++?

    I don't want to even think what my code would look like if I had to do this...

    Code:
    // Unicodetest.cpp
    
    #ifdef _MBCS
    #undef _MBCS
    #endif
    
    #define _UNICODE
    #define UNICODE
    
    
    #include <iostream>
    #include <string>
    #include <tchar.h>
    
    using namespace std;
    
    int main(void)
    {
    #ifdef _UNICODE
      wstring Greet;
    #else
      string Greet;
    #endif
    
      Greet = _T("Hello World!!");
    
    
    #ifdef _UNICODE
      wcout << Greet << endl;
    #else
      cout << Greet << endl;
    #endif
    
      return 0; }
    So, what am I missing????

  2. #2
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    UNICODE is used by PSDK
    _UNICODE is used by CRT/MFC/ATL

    Define both or un-define both - preferably at the project settings level.

    It's good to understanding what TCHAR's are why they were invented - but I don't recommend using them. They were good for targeting Win9x/MBCS and NT/Unicode simultaneously. Not a requirement for new development these days.

    gg

  3. #3
    'Allo, 'Allo, Allo
    Join Date
    Apr 2008
    Posts
    639
    I thought the question was "Why isn't there a std::tstring and std::tout?", to which the response would be:

    typedef std::basic_string<TCHAR> tstring;
    and
    #ifdef _UNICODE
    #define tcout wcout
    #else
    #define tcout cout
    #endif

    But yeah, what Codeplug said. They had their uses, but using them for new code is daft. WinME is now what, 12 years ago.

    Also, in C++, you can let the string type determine which function gets called, rather than a define.

  4. #4
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> but they didn't think to do this in VC++?
    Ah - you mean in the standard library? You could do that yourself: TString manipulation...

    But there's not really a need these days.

    gg

  5. #5
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Codeplug View Post
    >> but they didn't think to do this in VC++?
    Ah - you mean in the standard library? You could do that yourself: TString manipulation...

    But there's not really a need these days.

    gg
    So... what I'm hearing from this is that I should just use the wstring (etc) type everywhere and do everything in unicode?

    That's doable, I suppose...

    The only exception will be files, but they're always a pain anyway...

  6. #6
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Codeplug View Post
    UNICODE is used by PSDK
    _UNICODE is used by CRT/MFC/ATL

    Define both or un-define both - preferably at the project settings level.
    This I already knew, I got pretty used to that with Pelles C. It uses the same macros.

    What I didn't expect was to have to use different names for everything; as it turns out the alternate names are pretty poorly documented.

  7. #7
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> I should just use the wstring (etc) type everywhere and do everything in unicode?
    Or all narrow. I tend to stick with all wide on Windows.

    >> The only exception will be files, but they're always a pain anyway...
    Well, the wide, formatted I/O interfaces to streams/files in the standard libraries will convert your wide strings to narrow strings based on the LC_CTYPE of the corresponding locale. Newer MS standard libraries do have some extensions in that area for supporting a few Unicode file encodings.

    gg

  8. #8
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Codeplug View Post
    >> I should just use the wstring (etc) type everywhere and do everything in unicode?
    Or all narrow. I tend to stick with all wide on Windows.
    Hmmmm.... ascii isn't going to work with anything except English, really. It won't make much difference for the statics in dialogs and such but it could pose a real problem for anyone who has filenames etc. in non-latin text... directory listings (for example) would just be scrambled eggs...

    >> The only exception will be files, but they're always a pain anyway...
    Well, the wide, formatted I/O interfaces to streams/files in the standard libraries will convert your wide strings to narrow strings based on the LC_CTYPE of the corresponding locale. Newer MS standard libraries do have some extensions in that area for supporting a few Unicode file encodings.

    gg
    Even that can be a problem... if the system automatically converts (for example) a unicode/cyrillic file to ascii, it's toast... even if it was just sent to me to be reprocessed for some reason. If my local causes a type, even endedness, conversion it's going to scramble the file... not good.

    So... LC_CTYPE is what exactly?

  9. #9
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> ascii isn't going to work with anything except English
    "Narrow" in standard C or Windows isn't just ASCII. There are 8bit character sets for a whole bunch of languages - in standard C and Windows. For all the languages that Windows supports, there are a hand-full that are "Unicode-Only" however.

    >> it could pose a real problem for anyone who has filenames etc. in non-latin text...
    Windows filesystems have supported international characters fairly well. They only suffer slightly from certain round-trip issues with ACP<->Unicode conversions. But since NTFS stores things in UTF16LE, using wide Win32 API's typically avoids these issues. FAT/FAT32 can have "what's my encoding" issues (more below).

    >> if the system automatically converts (for example) a unicode/cyrillic file to ascii, it's toast...
    Not really. For those Windows locales that are not Unicode-only, the conversion from Unicode to the systems ansi-code-page (ACP) is well defined. And on those systems, notepad.exe expects the typical (8bit) TXT file to be encoding with that ACP and so when it reads and displays the contents all is good.

    >> even if it was just sent to me to be reprocessed for some reason
    Yes, a common ACP-encoded text file has nothing in it that says "hey! this is my encoding!". So on a system with a different ACP, notepad.exe can map the same bytes to totally different glyphs. The same issue can occur with FAT/FAT32, which stores names using the systems ACP. So changing your own ACP, or swapping thumb-drives with a foreign friend can result strange glyphs for filenames. Most folks have learned not to use non-ASCII characters in pathnames in order to mitigate this issue.

    But keep in mind that formats like HTML have solved the "what's my encoding" problem by specifying the encoding.
    For text files on Windows, a Unicode encoding with BOM avoids these issues when using characters outside of the ASCII character set.

    >> LC_CTYPE is what exactly?
    It's part of standard C's support of locales. For example, when you call "setlocale(LC_ALL, "");", you're saying "I want to use the users default locale settings". One of those settings is LC_CTYPE, which among other things, specifies the 8bit character encoding which is expected by standard C char API's - and is the encoding in which wchar_t strings are converted to in (non-binary) formatted stream I/O.

    gg

  10. #10
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Thank you Codeplug... that was an excellent explaination.

    So I should take it that within a given area, expecially on the same machine, this is something of a non-issue if everything is Unicode.

    The reason it's a matter of some concern is that I've already bumped into it with my current Freeware offering... One of the first things to happen was that I installed a copy on a friend's computer, and he has a number of folders (his music collection in particular) that were set up in the Ukrane before he immigrated here... So he's actually got a mix of text on his hard disk. I solved the problem by re-writing the project in Unicode and basically not caring what was in any text strings it worked with.

    I did however, stay with the idea of being able to compile it both ways; primarily out of habit, I think.

    But I'll take your assurances and I do appreciate your help...
    As the one page I stumbled through looking up the LC_CTYPE thing said "Say goodbye to Char*" ....

    Thanks again.

  11. #11
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> he has a number of folders (his music collection in particular) that were set up in the Ukrane before he immigrated here.
    If you use ansi Win32API's, then Windows trys to convert the Unicode pathnames to the systems ACP. If the systems ACP does not support a particular glyph, you end up with something like "?". Or even more fun, a "best fit" character is found so the pathname looks correct but doesn't work.

    Going wide on windows helps to avoid this kind of fun.

    gg

  12. #12
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Codeplug View Post
    Going wide on windows helps to avoid this kind of fun.

    gg
    Yeah, once I did the re-compile his system displayed everything just fine... I did, however, caution him not to rename anything until I got back....

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. what am I missing? (Program won't compile)
    By steals10304 in forum C Programming
    Replies: 3
    Last Post: 08-25-2009, 03:01 PM
  2. failure to import external C libraries in C++ project
    By nocturna_gr in forum C++ Programming
    Replies: 3
    Last Post: 12-02-2007, 03:49 PM
  3. ras.h errors
    By Trent_Easton in forum Windows Programming
    Replies: 8
    Last Post: 07-15-2005, 10:52 PM
  4. pointer to array of objects of struct
    By undisputed007 in forum C++ Programming
    Replies: 12
    Last Post: 03-02-2004, 04:49 AM
  5. UNICODE and GET_STATE
    By Registered in forum C++ Programming
    Replies: 1
    Last Post: 07-15-2002, 03:23 PM