Thread: how do i printf("®") to the console?

  1. #1
    Registered User
    Join Date
    Oct 2009
    Posts
    1

    how do i printf("®") to the console?

    I tried to printf("®") to the console but get the << char instead. Any1 know how to do it?

  2. #2
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    I don't see that character listed here: Ascii Table - ASCII character codes and html, octal, hex and decimal chart conversion
    So I'm guessing you're SOL.
    "I am probably the laziest programmer on the planet, a fact with which anyone who has ever seen my code will agree." - esbo, 11/15/2008

    "the internet is a scary place to be thats why i dont use it much." - billet, 03/17/2010

  3. #3
    Registered User
    Join Date
    Oct 2006
    Location
    Canada
    Posts
    1,243
    Its not an ASCII (i.e. 8 bit) character, its Unicode. See here: Unicode Character 'REGISTERED SIGN' (U+00AE). Theres even a (among others) C example.

  4. #4
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Quote Originally Posted by nadroj View Post
    Its not an ASCII (i.e. 8 bit) character, its Unicode. See here: Unicode Character 'REGISTERED SIGN' (U+00AE). Theres even a (among others) C example.
    Well, the problem is that you can't really display unicode characters to the console. There are ways, but they are importable. Linux's console uses UTF-8. Windows's uses UTF-16. Other systems may use other encodings. And even if you do output the proper characters that represent the proper encoding of the character, you can only hope the console has the fonts to display the character.

    That said, this works *for me* in Linux:
    Code:
    #include <stdio.h>
    
    int main()
    {
      printf("\xC2\xAE\n");
    }

  5. #5
    Registered User
    Join Date
    Oct 2006
    Location
    Canada
    Posts
    1,243
    Yes, of course it relies on what your running it in (Unix shell, Windows command prompt), as well as the font being used. Most Unix-es support Unicode very well out of the box. So the method in the example in the link I provided above should work without any modifications.

    Windows Unicode command line support is very bad, I think, from my experience in having worked with it for a number of months on a project. I dont think Windows command prompt uses UTF-16 (BE or LE), by default. I think it uses ASCII and "code pages", which is not Unicode (therefore not any form of UTF). In addition to this, the default Windows command line font does not really support Unicode, as you mentioned.

  6. #6
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Isn't there support for wide characters in the standard with the wchar_t data type?

    Nevermind I just saw that it's up to individual implementations.

  7. #7
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Quote Originally Posted by nadroj View Post
    Windows Unicode command line support is very bad, I think, from my experience in having worked with it for a number of months on a project. I dont think Windows command prompt uses UTF-16 (BE or LE), by default. I think it uses ASCII and "code pages", which is not Unicode (therefore not any form of UTF). In addition to this, the default Windows command line font does not really support Unicode, as you mentioned.
    I'm not sure how those code-page .......... works, I never code in Windows. But the one time I tried to figure it out, I managed to output the characters I wanted by outputting UTF-16. But maybe I was just lucky it was in the right codepage.
    Damn, what idiot thinks of that crap?

  8. #8
    Registered User
    Join Date
    Oct 2006
    Location
    Canada
    Posts
    1,243
    Its possible you were able to print it "incorrectly", though it appeared to work. That is, maybe the code page you were using, the encoding, the font, the decimal/hexadecimal value you used, etc., all lined up to print the exact character you wanted.

    Code pages are basically different encoding subsets of the Unicode character set. So if the program prints value "0x123" (whatever) and it should print (Unicode) character "X" (whatever), it might work, depending on the variables mentioned above. In code page "A" it might print the correct value, if the encoded value, 0x123, maps to the character, X. In other code pages, some other character might be mapped to that value, say, "Y".

    Theres also certain code pages in Windows that represent, say, UTF8. This means that, when youre on that code page, you know exactly what value it must print. However, this kind of defeats the purpose of Unicode--you just want to print some character and not worry about "code pages". So code pages are slowly being fazed out, so that when you want to print a (R), you just give its Unicode value, according to whatever UTF encoding your using.
    Last edited by nadroj; 10-31-2009 at 09:34 AM. Reason: grammar

  9. #9
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Unicode (UTF16LE) output to the console is possible via WriteConsoleW. If you are using a recent MS CRT (at least VS 2008 I think) then you can call "_setmode(_fileno(stdout), _O_U16TEXT)", which will cause "wide" output to stdout to be written directly as UTF16LE (via WriteConsoleW).

    None of this will help if the console isn't using a Unicode font like Lucida Console. But even it doesn't support all Unicode characters. You can use charmap.exe to see what Unicode characters is does support.

    gg

  10. #10
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Code:
    #include <wchar.h>
    
    int main(){
      putwchar(0xAE);
      return 0;
    }
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  11. #11
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Quote Originally Posted by King Mir View Post
    Code:
    #include <wchar.h>
    
    int main(){
      putwchar(0xAE);
      return 0;
    }
    That doesn't realy work in general, but Linux will do clever things for you...
    Quote Originally Posted by ISO/IEC 9899:1999 (E)
    7.19.3 - 12
    The wide character output functions convert wide characters to multibyte characters and write them to the stream as if they were written by successive calls to the fputwc function. Each conversion occurs as if by a call to the wcrtomb function, with the conversion state described by the stream’s own mbstate_t object. The byte output functions write characters to the stream as if by successive calls to the fputc function.
    The locale determines what the multibyte representation is - that's the "C locale" by default. On Linux (glibc), the wide character representation is UTF32[LE/BE]. Passing 0xAE to wcrtomb, with a C locale, results in "(R)" - which is fairly clever. If I call setlocale(LC_ALL, "") first, then the default user locale is used, which is typically UTF8. Then the wcrtomb convertion (UTF32->UTF8) results in a "®".

    On Windows, the MSCRT treats "C locale" characters as what ever the default console codepage is. In other words, multibyte character values are just indexes into the default codepage. Changing the locale (LC_CTYPE) under windows is just changing what codepage to use. Wide characters in Widows are UTF16LE. Calling wcrtomb under the C locale will simply "assign" each wchar_t to a char. Any wchar_t's greater than 0xFF results in an error. So in the end, we get a codepage index of 0xAE. My default codepage is 437 - http://msdn.microsoft.com/en-us/goglobal/cc305156.aspx As you can see, index 0xAE is "«", or U+00AB.

    gg

  12. #12
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    So now is the real question, how do we portably write programs that support unicode input/output to a window?
    Guess JAVA does have its uses... But that's MS's fault.

  13. #13
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> how do we portably write programs that support Unicode input/output
    Portably speaking, you don't. Encoding is based on the locale which is typically abstracted away from the programmer. The only characters you can count on in any environment are the "basic character set" characters. This gives you A-Z, a-z, 0-9, "! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~", space, and the standard escape sequences (alert, tab, etc...). Those characters are available regardless of the current LC_CTYPE.

    I believe the standard even allows the wide character representation to change when the LC_CTYPE is changed. In reality though, Windows uses UTF16LE and *nix (glibc) uses UTF32[LE/BE]. But as 7.19.3 - 12 describes, wide character output is converted (I think of it as "normalized") to its multibyte representation first.

    The first assumption you could make is that the user's default locale can handle whatever wchar_t's you throw at it:
    Code:
    #include <wchar.h>
    #include <locale.h>
    
    int main()
    {
        setlocale(LC_ALL, "");
        wprintf(L"\u00ae\n");
        return 0;
    }
    On Linux with a UTF8 locale and compatible terminal, U+00AE gets converted and written as bytes: "\xC2\xAE".

    Windows does not support UTF8 as a locale's multibyte encoding. It only supports codepages. On Windows (with 2008 CRT), the above program gives me an "r". Another thing to understand under Windows is that there are two codepages being considered for output to the console. First, the UTF16 character is converted to the codepage character associated with the current locale. The default user locale uses the ansi-codepage (as returned by GetACP). For me that's 1252, which supports "®". However, the second codepage you must consider is the console codepage (as returned by GetConsole[Output]CP). The MSCRT will do one last conversion to this codepage before calling WriteFile on the standard output handle. For me, that's a conversion from 1252 to 437, or "®" to "r".

    Based on this knowledge, your next attempt on Windows might be:
    Code:
    #include <wchar.h>
    #include <locale.h>
    #include <windows.h>
    
    int main()
    {
        setlocale(LC_ALL, "");
        SetConsoleOutputCP(GetACP());
        SetConsoleCP(GetACP());
        wprintf(L"\u00ae\n");
        return 0;
    }
    Now we've set both the input and output console CP to the ACP. (For some reason, the 2008 CRT uses the input CP for conversion before output...) This basically eliminates the secondary conversion before output - which results in a "®" on my system.

    There is still a problem with this approach - you can only use characters supported by the ACP (and you can't change the ACP). For true Unicode support, you want to use the WriteConsoleW API and bypass any codepage conversions. This can be accomplished with the 2008 CRT with the following:
    Code:
    #include <stdio.h>
    #include <fcntl.h>
    #include <io.h>
    
    int main()
    {
        _setmode(_fileno(stdout), _O_U16TEXT);
        wprintf(L"\u00ae\n");
        return 0;
    }
    Here, U+00AE is sent directly to WriteConsoleW without conversion. Then you just need a console font that has a glyph for U+00AE.

    gg

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Console, Terminal and Terminal Emulator
    By lehe in forum C Programming
    Replies: 4
    Last Post: 02-15-2009, 09:59 PM
  2. Full Screen Console
    By St0rmTroop3er in forum C++ Programming
    Replies: 1
    Last Post: 09-26-2005, 09:59 PM
  3. Problems with a simple console game
    By DZeek in forum C++ Programming
    Replies: 9
    Last Post: 03-06-2005, 02:02 PM
  4. Console Functions Help
    By Artist_of_dream in forum C++ Programming
    Replies: 9
    Last Post: 12-04-2004, 03:44 AM
  5. Just one Question?
    By Irish-Slasher in forum C++ Programming
    Replies: 6
    Last Post: 02-12-2002, 10:19 AM