Wide string to Ansi String

Printable View

08-07-2004
Davros

Wide string to Ansi String

Is there any standard routine for converting wide strings into ansi strings?
08-07-2004
Salem

What's an ANSI string?
Do you mean an ASCII string?

You could use UTF-8 to encode the wide string in an ASCII string.
08-07-2004
Codeplug

wcstombs()
WideCharToMultiByte()

gg
08-07-2004
Davros

I simply want the reverse of _T, which is commonly used in Windows.

Is thise something I need to write?

I mean to simply strip of the high order bytes. So effectively, only the first 256 bytes can be translated, and '?' or something shown when the character is has a value larger than 255.

Thanks for the multi-byte suggestion, but I don't think its suitable cos I want to preserve ANSI characters (i.e. those outside ASCII).
08-07-2004
Codeplug

>> I simply want the reverse of _T...
_T(x) is a macro that will put "L" in front of "x" if _UNICODE is defined. Otherwise it simply evaluates to "x".
Don't convolute your code by using _T(), TEXT(), or TCHAR's unless you actually need to support both a UNICODE and ANSI target at the same time.

>> Is thise something I need to write?
wcstombs() and mbstowcs() is what you use to convert wchar_t strings to and from char strings. It will use the code page setting of the current locale to determine what type of "multibyte-character" to use. The default locale is "C", which uses the ANSI character set and a 1 byte character encoding. So a "multi-byte" char string under the "C" locale is simply a null-terminated C-string.

>> I mean to simply strip of the high order bytes.
That's basically what it does under the "C" locale.

>> So effectively, only the first 256 bytes can be translated...
Since the "C" locale uses the ANSI character set, which uses the values 0-255, yes (under the "C" locale). You can get different behavior under different locales. And you have alot more conversion options when using the WideCharToMultiByte() and MultiByteToWideChar() API's if you need them.

>> I want to preserve ANSI characters (i.e. those outside ASCII).
Both ANSI and extended-ASCII character codes range from 0-255. 7-bit ASCII is 0-127. In other words, any character code between 0-255 will be preserved under the "C" locale when using mbstowcs() and wcstombs().

gg
08-07-2004
Davros

I'm glad I asked this question - for the first time I need to mix ANSI and UNICODE. Thanks for the reply CodePlug.

>_T(x) is a macro that will put "L" in front of "x"

OK. What does the definition of _T look like? What is the syntax placing 'L' in front of 'x'. Is it something like this?:

std::wstring s = L"Hello World and his Dog!";

>Both ANSI and extended-ASCII character codes range from 0-255. 7-bit ASCII is 0-127. In other words, any character code between 0-255 will be preserved under the "C" locale when using mbstowcs() and wcstombs().

I was of the understanding that only ASCII (0-127) were preserved. Thanks for telling me otherwise.

>You can get different behavior under different locales.

Am I correct in thinking the C locale is default (irrespective of the platform) and that I would have to change the locale explicitly in my code for it to be otherwise?

What I am asking is that if I make assumptions for the C locale, is my program is going to come unstuck when run on a Russian PC?
08-07-2004
Davros

To follow on from my last question, what I'm really concerned about is that my streams use the C locale by default, as defined by:

ios_base:getLoc()

Is this the case? From the C++ documentation it appears so, but I'm not 100% sure.
08-08-2004
Codeplug

Well, after some research, the standard does not mandate what the default global locale will be at startup and will most likely be the default locale for you country/region - which makes sense.

As far as the "C" locale is concerned, I think the only guarantee on character glyphs is that you'll get the English, 7-bit, printable, ASCII glyphs (32-126) and a few control codes for standard escape sequences '\a', '\b', '\t', etc...

Other than that, you are relying on a particular OS or standard C/C++ implementation. For example, the "C" locale under Microsoft's C runtime uses the extended ASCII character set.

You can lookup the _T() macro in tchar.h, and yes it will prepend your string literals with an "L" if _UNICODE is defined.
The TEXT() macro is part of the platform SDK and does the exact same thing except it uses the UNICODE define (no leading underscore).
The reason there is two of them is because tchar.h provides a TCHAR interface to most of the MS C library, and they didn't want to make it dependent on any platform SDK defines or macros.

>> for the first time I need to mix ANSI and UNICODE
Why - out of curiosity?

gg