how do i printf("®") to the console?

**charris67** · 10-30-2009

I tried to printf("®") to the console but get the << char instead. Any1 know how to do it?

**cpjust** · 10-30-2009

I don't see that character listed here: Ascii Table - ASCII character codes and html, octal, hex and decimal chart conversion
So I'm guessing you're SOL.

**nadroj** · 10-30-2009

Its not an ASCII (i.e. 8 bit) character, its Unicode. See here: Unicode Character 'REGISTERED SIGN' (U+00AE). Theres even a (among others) C example.

**EVOEx** · 10-31-2009

Originally Posted by nadroj

Its not an ASCII (i.e. 8 bit) character, its Unicode. See here: Unicode Character 'REGISTERED SIGN' (U+00AE). Theres even a (among others) C example.

Well, the problem is that you can't really display unicode characters to the console. There are ways, but they are importable. Linux's console uses UTF-8. Windows's uses UTF-16. Other systems may use other encodings. And even if you do output the proper characters that represent the proper encoding of the character, you can only hope the console has the fonts to display the character.

That said, this works *for me* in Linux:

Code:

#include <stdio.h>

int main()
{
  printf("\xC2\xAE\n");
}

**nadroj** · 10-31-2009

Yes, of course it relies on what your running it in (Unix shell, Windows command prompt), as well as the font being used. Most Unix-es support Unicode very well out of the box. So the method in the example in the link I provided above should work without any modifications.

Windows Unicode command line support is very bad, I think, from my experience in having worked with it for a number of months on a project. I dont think Windows command prompt uses UTF-16 (BE or LE), by default. I think it uses ASCII and "code pages", which is not Unicode (therefore not any form of UTF). In addition to this, the default Windows command line font does not really support Unicode, as you mentioned.

**Subsonics** · 10-31-2009

Isn't there support for wide characters in the standard with the wchar_t data type?

Nevermind I just saw that it's up to individual implementations.

**EVOEx** · 10-31-2009

Originally Posted by nadroj

Windows Unicode command line support is very bad, I think, from my experience in having worked with it for a number of months on a project. I dont think Windows command prompt uses UTF-16 (BE or LE), by default. I think it uses ASCII and "code pages", which is not Unicode (therefore not any form of UTF). In addition to this, the default Windows command line font does not really support Unicode, as you mentioned.

I'm not sure how those code-page .......... works, I never code in Windows. But the one time I tried to figure it out, I managed to output the characters I wanted by outputting UTF-16. But maybe I was just lucky it was in the right codepage.
Damn, what idiot thinks of that crap?

**nadroj** · 10-31-2009

Its possible you were able to print it "incorrectly", though it appeared to work. That is, maybe the code page you were using, the encoding, the font, the decimal/hexadecimal value you used, etc., all lined up to print the exact character you wanted.

Code pages are basically different encoding subsets of the Unicode character set. So if the program prints value "0x123" (whatever) and it should print (Unicode) character "X" (whatever), it might work, depending on the variables mentioned above. In code page "A" it might print the correct value, if the encoded value, 0x123, maps to the character, X. In other code pages, some other character might be mapped to that value, say, "Y".

Theres also certain code pages in Windows that represent, say, UTF8. This means that, when youre on that code page, you know exactly what value it must print. However, this kind of defeats the purpose of Unicode--you just want to print some character and not worry about "code pages". So code pages are slowly being fazed out, so that when you want to print a (R), you just give its Unicode value, according to whatever UTF encoding your using.

**Codeplug** · 10-31-2009

Unicode (UTF16LE) output to the console is possible via WriteConsoleW. If you are using a recent MS CRT (at least VS 2008 I think) then you can call "_setmode(_fileno(stdout), _O_U16TEXT)", which will cause "wide" output to stdout to be written directly as UTF16LE (via WriteConsoleW).

None of this will help if the console isn't using a Unicode font like Lucida Console. But even it doesn't support all Unicode characters. You can use charmap.exe to see what Unicode characters is does support.

gg

**King Mir** · 10-31-2009

Code:

#include <wchar.h>

int main(){
  putwchar(0xAE);
  return 0;
}

**Codeplug** · 10-31-2009

Originally Posted by King Mir

Code:

#include <wchar.h>

int main(){
  putwchar(0xAE);
  return 0;
}

That doesn't realy work in general, but Linux will do clever things for you...

Originally Posted by ISO/IEC 9899:1999 (E)

7.19.3 - 12
The wide character output functions convert wide characters to multibyte characters and write them to the stream as if they were written by successive calls to the fputwc function. Each conversion occurs as if by a call to the wcrtomb function, with the conversion state described by the stream’s own mbstate_t object. The byte output functions write characters to the stream as if by successive calls to the fputc function.

The locale determines what the multibyte representation is - that's the "C locale" by default. On Linux (glibc), the wide character representation is UTF32[LE/BE]. Passing 0xAE to wcrtomb, with a C locale, results in "(R)" - which is fairly clever. If I call setlocale(LC_ALL, "") first, then the default user locale is used, which is typically UTF8. Then the wcrtomb convertion (UTF32->UTF8) results in a "®".

On Windows, the MSCRT treats "C locale" characters as what ever the default console codepage is. In other words, multibyte character values are just indexes into the default codepage. Changing the locale (LC_CTYPE) under windows is just changing what codepage to use. Wide characters in Widows are UTF16LE. Calling wcrtomb under the C locale will simply "assign" each wchar_t to a char. Any wchar_t's greater than 0xFF results in an error. So in the end, we get a codepage index of 0xAE. My default codepage is 437 - http://msdn.microsoft.com/en-us/goglobal/cc305156.aspx As you can see, index 0xAE is "«", or U+00AB.

gg

**EVOEx** · 11-01-2009

So now is the real question, how do we portably write programs that support unicode input/output to a window?

Guess JAVA does have its uses... But that's MS's fault.

**Codeplug** · 11-01-2009

>> how do we portably write programs that support Unicode input/output
Portably speaking, you don't. Encoding is based on the locale which is typically abstracted away from the programmer. The only characters you can count on in any environment are the "basic character set" characters. This gives you A-Z, a-z, 0-9, "! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~", space, and the standard escape sequences (alert, tab, etc...). Those characters are available regardless of the current LC_CTYPE.

I believe the standard even allows the wide character representation to change when the LC_CTYPE is changed. In reality though, Windows uses UTF16LE and *nix (glibc) uses UTF32[LE/BE]. But as 7.19.3 - 12 describes, wide character output is converted (I think of it as "normalized") to its multibyte representation first.

The first assumption you could make is that the user's default locale can handle whatever wchar_t's you throw at it:

Code:

#include <wchar.h>
#include <locale.h>

int main()
{
    setlocale(LC_ALL, "");
    wprintf(L"\u00ae\n");
    return 0;
}

On Linux with a UTF8 locale and compatible terminal, U+00AE gets converted and written as bytes: "\xC2\xAE".

Windows does not support UTF8 as a locale's multibyte encoding. It only supports codepages. On Windows (with 2008 CRT), the above program gives me an "r". Another thing to understand under Windows is that there are two codepages being considered for output to the console. First, the UTF16 character is converted to the codepage character associated with the current locale. The default user locale uses the ansi-codepage (as returned by GetACP). For me that's 1252, which supports "®". However, the second codepage you must consider is the console codepage (as returned by GetConsole[Output]CP). The MSCRT will do one last conversion to this codepage before calling WriteFile on the standard output handle. For me, that's a conversion from 1252 to 437, or "®" to "r".

Based on this knowledge, your next attempt on Windows might be:

Code:

#include <wchar.h>
#include <locale.h>
#include <windows.h>

int main()
{
    setlocale(LC_ALL, "");
    SetConsoleOutputCP(GetACP());
    SetConsoleCP(GetACP());
    wprintf(L"\u00ae\n");
    return 0;
}

Now we've set both the input and output console CP to the ACP. (For some reason, the 2008 CRT uses the input CP for conversion before output...) This basically eliminates the secondary conversion before output - which results in a "®" on my system.

There is still a problem with this approach - you can only use characters supported by the ACP (and you can't change the ACP). For true Unicode support, you want to use the WriteConsoleW API and bypass any codepage conversions. This can be accomplished with the 2008 CRT with the following:

Code:

#include <stdio.h>
#include <fcntl.h>
#include <io.h>

int main()
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    wprintf(L"\u00ae\n");
    return 0;
}

Here, U+00AE is sent directly to WriteConsoleW without conversion. Then you just need a console font that has a glyph for U+00AE.

gg

Thread: how do i printf("®") to the console?

Thread Tools

Search Thread

Display

how do i printf("®") to the console?

Similar Threads

Console, Terminal and Terminal Emulator

Full Screen Console

Problems with a simple console game

Console Functions Help

Just one Question?