Thread: ANSI or UNICODE

  1. #1
    Registered User
    Join Date
    Mar 2011
    Posts
    53

    Question ANSI or UNICODE

    i'm programming win32, and i noticed that every function has got an ANSI version (for example CreateWindowA), and a wide character version (CreateWindowW in this case). Can i use the ANSI versions freely, without any data loss?

    I read somewhere that Unicode works faster, is it true?

  2. #2
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by new_in_c++ View Post
    i'm programming win32, and i noticed that every function has got an ANSI version (for example CreateWindowA), and a wide character version (CreateWindowW in this case). Can i use the ANSI versions freely, without any data loss?
    Not if you're processing unicode text you can't.

    The unicode (Wide) versions of windows api calls are selected automatically when you use ...
    Code:
    #define UNICODE
    ... at the top of your source pages. It has to be the first line.

    From there on you use WCHAR (PWCHAR, LPCWCHAR, etc) types instead of CHAR types.
    Alternatively you can use the TCHAR types throughout and they will autoswitch with the UNICODE define as well... In C or C++ you need to use the wcs version of all string functions.

    But unicode is far from simple, in fact it's a royal pain in the backside...
    Unicode is not 1 thing... there are at least 5 major variations UTF8, UTF16le, UTF16be, UTF32le and UTF32be... The numbers signify the character size, le means "Little Endian" which is most Windows systems, be means "Big Endian" which pretty much means "everyone else". So not only do you have to convert character sizes, you end up re-ordering the bytes inside each character as well.

    Lots and lots to read up on... HERE

    Theoretically unicode text files are supposed to have a Byte Order Message (BOM) at the beginning to make identifying the file content easier. But it's not always there so you are stuck having to "discover" the file's content.

    Windows, up to 7 is internally UTF16le.

    You will need to manually reverse the byte order in each character for UTF16be.

    Windows provides the APIs you need for converting to and from UTF8, which is the current internet standard in MultiByteToWideChar() and WideCharToMultiByte()

    UTF32 is not directly supported but there are convernstion libraries becoming available.


    I read somewhere that Unicode works faster, is it true?
    Given that, this is what it takes to open a "plain text" playlist file in a unicode world, I'll let you decide...
    Code:
    // open and translate playlist
    BOOL M3UOpen(PWCHAR FileName)
      { PBYTE  rf;      // raw file data
        DWORD  br;      // bytes read
        // load the raw file
        { HANDLE pl;    // playlist file handle 
          DWORD  fs;    // file size
          // get path to file
          wcsncpy(FilePath,FileName,MAX_PATH);
          PathRemoveFileSpec(FilePath);
          wcscat(FilePath,L"\\");
          // open the file
          pl = CreateFile(FileName,GENERIC_READ,0,NULL,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL);
          if (pl == INVALID_HANDLE_VALUE)
            Exception(GetLastError());
          fs = GetFileSize(pl,NULL);        
          rf = calloc(fs + 2, sizeof(BYTE));
          if (! ReadFile(pl, rf, fs, &br, NULL))
            Exception(GetLastError());
          CloseHandle(pl);  
          if (br != fs)
            Exception(0xE00640007); } 
        try                                   
         { DWORD bom = *(DWORD*)rf;
           if ((bom == 0x0000FEFF) || (bom == 0xFFFE0000))  // utf32le bom  
             Exception(0xE0640002);                         // utf32be bom  
           else if ((bom & 0xFFFF) == 0xFFFE)               // utf16be bom
             { FlipEndian(rf,br);
               CopyWchar((PWCHAR) rf + 1); }
           else if ((bom & 0xFFFF) == 0xFEFF)               // utf16le bom
             CopyWchar((PWCHAR) rf + 1);  
           else if ((bom & 0xFFFFFF) == 0xBFBBEF)           // utf8 bom
             CopyMByte(rf + 3, br - 3);
           else                                             // no known bom, probe the file
             { if (! memchr(rf, 0x00, br))                  // 8 bit text has no nulls
                 CopyMByte(rf,br);                          // ansi / utf8 no bom
               else 
                { PBYTE lf = memchr(rf,0x0A,br);            // lf is always present as 1 byte.
                  if (!lf) 
                    Exception(0xE0640003);
                  if ((!(*(DWORD*)(lf - 3) & 0x00FFFFFF)) ||    //utf32be no bom
                       (!(*(DWORD*)lf & 0xFFFFFF00)))           //utf32le no bom
                     Exception(0xE0640002);    
                  if ((lf - rf) & 1)                        // big endian? (lf at odd offset)
                    FlipEndian(rf,br);                      // utf16be no bom  
                  CopyWchar((PWCHAR) rf);  } } }            // utf16le no bom
         finally  
          { free(rf); }
        return 1; }
    The real annoyance is that unless you are writing strictly for personal use in english, you are almost forced to use unicode.
    Last edited by CommonTater; 03-13-2011 at 12:00 PM.

  3. #3
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    You can use ANSI without problems so long as you don't use non-english characters.
    If you are planning to use Unicode, I'd suggest a library, such as UTF8-CPP: UTF-8 with C++ in a Portable Way.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Elysia View Post
    Check out "his" other threads.
    Sorry... I did just that ... post removed.

  5. #5
    Registered User
    Join Date
    Mar 2011
    Posts
    53
    Thanks, but i have got another question. You stated that i could use

    Code:
    #define UNICODE
    to automatically select Unicode, so can write:

    Code:
    #define ANSI
    the same way?

  6. #6
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,739
    Quote Originally Posted by Elysia View Post
    You can use ANSI without problems so long as you don't use non-english characters.[/url].
    I think the problem occurs when you want to port a program from a specific OS version ( e.g Greek ) to another ( e.g English ). If in the former you've typed specific language letters, the latter will display them a whole lot differently. It's the eternal problem of character encoding. Even UNICODE that was created to address this issue fell prey to it!
    Devoted my life to programming...

  7. #7
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by new_in_c++ View Post
    Thanks, but i have got another question. You stated that i could use

    Code:
    #define UNICODE
    to automatically select Unicode, so can write:

    Code:
    #define ANSI
    the same way?
    Nope... just don't define UNICODE the default is OEM-ANSI ...

  8. #8
    Registered User
    Join Date
    Mar 2011
    Posts
    53
    But i didn't define anything, but all my functions take wide characters by default. (i'm using VC++ 2008 at the moment)

  9. #9
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    It's a project setting. And you really need to upgrade to 2010.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  10. #10
    Registered User NeonBlack's Avatar
    Join Date
    Nov 2007
    Posts
    431
    Quote Originally Posted by CommonTater View Post
    "Little Endian" which is most Windows systems, be means "Big Endian" which pretty much means "everyone else".
    That's not true at all. Anything running an Intel or ARM chip is going to be little endian (actually, I think ARM can be switched, but every device I've worked on has had it set to little). This includes Windows, Mac and Linux as well as every major smartphone/mobile platform. The only time I've ever used a big endian system was Solaris on SPARC (not open solaris- that's x86). The only other big endian system you might have a slim chance of encountering is powerpc on older macs.
    I copied it from the last program in which I passed a parameter, which would have been pre-1989 I guess. - esbo

  11. #11
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by NeonBlack View Post
    That's not true at all. Anything running an Intel or ARM chip is going to be little endian (actually, I think ARM can be switched, but every device I've worked on has had it set to little). This includes Windows, Mac and Linux as well as every major smartphone/mobile platform. The only time I've ever used a big endian system was Solaris on SPARC (not open solaris- that's x86). The only other big endian system you might have a slim chance of encountering is powerpc on older macs.
    Hey thanks for that... I just revisited the link and see that you're correct.

    Oddly enough I just got handed over 1000 M3U Playlists to convert to UTF8 for the EXM3U spec. (yeah I know old stuff) and something like 700 of them are big endian... So maybe a little "experiential skew" got in there

    Quote Originally Posted by new_in_c++ View Post
    But i didn't define anything, but all my functions take wide characters by default. (i'm using VC++ 2008 at the moment)
    It's not a compiler default so it must be coming from the IDE. Look in the project and compiler settings for Code::Blocks ... selecte the VC compiler and look for "extra defines" or such.

    Of course you can also turn it off by placing this at the top of your files...
    Code:
    #ifdef UNICODE
    #undef UNICODE
    #endif

    Afterthought... how are you calling your WinAPI functions? By the generic names? eg: FindFirstFile() With rare exception you should not call the specific A or W versions directly.
    Last edited by CommonTater; 03-13-2011 at 05:10 PM. Reason: Afterthought.

  12. #12
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by CommonTater View Post
    ...It's not a compiler default so it must be coming from the IDE. Look in the project and compiler settings for Code::Blocks ... selecte the VC compiler and look for "extra defines" or such...
    Did you miss the part about using Visual Studio?
    It's under project settings -> character set.
    And again, I urge everyone using 2008 to upgrade to 2010. Stop being stuck in the past.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  13. #13
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Elysia View Post
    Did you miss the part about using Visual Studio?
    It's under project settings -> character set.
    And again, I urge everyone using 2008 to upgrade to 2010. Stop being stuck in the past.
    Isn't that the compiler that's included in the Windows SDK?

    Lots of people use that with Code::Blocks.

  14. #14
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Visual Studio is an IDE. You are thinking of the compiler that comes with the IDE.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  15. #15
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Elysia View Post
    Visual Studio is an IDE. You are thinking of the compiler that comes with the IDE.
    Yeah... the compiler that comes with the Win 7 SDK is from msvc++ 2008.... ??

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Converting unicode filenames to ANSI
    By CaptainKirk in forum C++ Programming
    Replies: 3
    Last Post: 08-02-2010, 09:28 AM
  2. Dealing with Unicode and ANSI - Templates
    By Tonto in forum C++ Programming
    Replies: 9
    Last Post: 06-15-2007, 03:57 PM
  3. <string> to LPCSTR? Also, character encoding: UNICODE vs ?
    By Kurisu33 in forum C++ Programming
    Replies: 7
    Last Post: 10-09-2006, 12:48 AM
  4. Unicode v ANSI Calls
    By Davros in forum Windows Programming
    Replies: 3
    Last Post: 04-18-2006, 09:35 AM
  5. UNICODE and GET_STATE
    By Registered in forum C++ Programming
    Replies: 1
    Last Post: 07-15-2002, 03:23 PM

Tags for this Thread