Wide string to Ansi String

This is a discussion on Wide string to Ansi String within the C++ Programming forums, part of the General Programming Boards category; Is there any standard routine for converting wide strings into ansi strings?...

  1. #1
    Code Monkey Davros's Avatar
    Join Date
    Jun 2002
    Posts
    812

    Wide string to Ansi String

    Is there any standard routine for converting wide strings into ansi strings?
    OS: Windows XP
    Compilers: MinGW (Code::Blocks), BCB 5

    BigAngryDog.com

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,434
    What's an ANSI string?
    Do you mean an ASCII string?

    You could use UTF-8 to encode the wide string in an ASCII string.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,643

  4. #4
    Code Monkey Davros's Avatar
    Join Date
    Jun 2002
    Posts
    812
    I simply want the reverse of _T, which is commonly used in Windows.

    Is thise something I need to write?

    I mean to simply strip of the high order bytes. So effectively, only the first 256 bytes can be translated, and '?' or something shown when the character is has a value larger than 255.

    Thanks for the multi-byte suggestion, but I don't think its suitable cos I want to preserve ANSI characters (i.e. those outside ASCII).
    OS: Windows XP
    Compilers: MinGW (Code::Blocks), BCB 5

    BigAngryDog.com

  5. #5
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,643
    >> I simply want the reverse of _T...
    _T(x) is a macro that will put "L" in front of "x" if _UNICODE is defined. Otherwise it simply evaluates to "x".
    Don't convolute your code by using _T(), TEXT(), or TCHAR's unless you actually need to support both a UNICODE and ANSI target at the same time.

    >> Is thise something I need to write?
    wcstombs() and mbstowcs() is what you use to convert wchar_t strings to and from char strings. It will use the code page setting of the current locale to determine what type of "multibyte-character" to use. The default locale is "C", which uses the ANSI character set and a 1 byte character encoding. So a "multi-byte" char string under the "C" locale is simply a null-terminated C-string.

    >> I mean to simply strip of the high order bytes.
    That's basically what it does under the "C" locale.

    >> So effectively, only the first 256 bytes can be translated...
    Since the "C" locale uses the ANSI character set, which uses the values 0-255, yes (under the "C" locale). You can get different behavior under different locales. And you have alot more conversion options when using the WideCharToMultiByte() and MultiByteToWideChar() API's if you need them.

    >> I want to preserve ANSI characters (i.e. those outside ASCII).
    Both ANSI and extended-ASCII character codes range from 0-255. 7-bit ASCII is 0-127. In other words, any character code between 0-255 will be preserved under the "C" locale when using mbstowcs() and wcstombs().

    gg
    Last edited by Codeplug; 08-07-2004 at 03:22 PM.

  6. #6
    Code Monkey Davros's Avatar
    Join Date
    Jun 2002
    Posts
    812
    I'm glad I asked this question - for the first time I need to mix ANSI and UNICODE. Thanks for the reply CodePlug.

    >_T(x) is a macro that will put "L" in front of "x"

    OK. What does the definition of _T look like? What is the syntax placing 'L' in front of 'x'. Is it something like this?:

    std::wstring s = L"Hello World and his Dog!";


    >Both ANSI and extended-ASCII character codes range from 0-255. 7-bit ASCII is 0-127. In other words, any character code between 0-255 will be preserved under the "C" locale when using mbstowcs() and wcstombs().

    I was of the understanding that only ASCII (0-127) were preserved. Thanks for telling me otherwise.


    >You can get different behavior under different locales.

    Am I correct in thinking the C locale is default (irrespective of the platform) and that I would have to change the locale explicitly in my code for it to be otherwise?

    What I am asking is that if I make assumptions for the C locale, is my program is going to come unstuck when run on a Russian PC?
    Last edited by Davros; 08-07-2004 at 05:13 PM.
    OS: Windows XP
    Compilers: MinGW (Code::Blocks), BCB 5

    BigAngryDog.com

  7. #7
    Code Monkey Davros's Avatar
    Join Date
    Jun 2002
    Posts
    812
    To follow on from my last question, what I'm really concerned about is that my streams use the C locale by default, as defined by:

    ios_base:getLoc()

    Is this the case? From the C++ documentation it appears so, but I'm not 100% sure.
    OS: Windows XP
    Compilers: MinGW (Code::Blocks), BCB 5

    BigAngryDog.com

  8. #8
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,643
    Well, after some research, the standard does not mandate what the default global locale will be at startup and will most likely be the default locale for you country/region - which makes sense.

    As far as the "C" locale is concerned, I think the only guarantee on character glyphs is that you'll get the English, 7-bit, printable, ASCII glyphs (32-126) and a few control codes for standard escape sequences '\a', '\b', '\t', etc...

    Other than that, you are relying on a particular OS or standard C/C++ implementation. For example, the "C" locale under Microsoft's C runtime uses the extended ASCII character set.

    You can lookup the _T() macro in tchar.h, and yes it will prepend your string literals with an "L" if _UNICODE is defined.
    The TEXT() macro is part of the platform SDK and does the exact same thing except it uses the UNICODE define (no leading underscore).
    The reason there is two of them is because tchar.h provides a TCHAR interface to most of the MS C library, and they didn't want to make it dependent on any platform SDK defines or macros.

    >> for the first time I need to mix ANSI and UNICODE
    Why - out of curiosity?

    gg

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. String Class
    By BKurosawa in forum C++ Programming
    Replies: 117
    Last Post: 08-09-2007, 01:02 AM
  2. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 02:23 PM
  3. creating class, and linking files
    By JCK in forum C++ Programming
    Replies: 12
    Last Post: 12-08-2002, 01:45 PM
  4. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21