Thread: Struggling to print a wide character

  1. #1
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657

    Struggling to print a wide character

    This is what I got:
    Code:
    	wchar_t mizuL[] = L"⺢";
    	char8_t mizu8[] = u8"⺢";
    	char16_t mizu16[] = u"⺢";
    	char32_t mizu32[] = U"⺢";
    ...
    	(void)wprintf(L"mizuL: \"%%ls\" = \"%ls\"\n", mizuL );
    	(void)wprintf(L"mizuL: '%%c' = '%lc'\n", mizuL[0] );
    	(void)wprintf(L"mizu8: '%%c' = '%lc'\n", mizu8[0] );
    	(void)wprintf(L"mizu16: '%%c' = '%lc'\n", mizu16[0] );
    	(void)wprintf(L"mizu32: '%%c' = '%lc'\n", mizu32[0] );
    This is my output:
    Code:
    mizuL: "%ls" = "?"
    mizuL: '%c' = '?'
    mizu8: '%c' = '?'
    mizu16: '%c' = '?'
    mizu32: '%c' = '?'
    Any ideas why it's not printing correctly? (BTW I have already checked if it will display a mizu character which it does)

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    27,271
    Perhaps check the character encoding of the command prompt window/terminal.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657
    Said UTF-8 when I looked at it, both geany terminal and normal terminal give same result

  4. #4
    Registered User
    Join Date
    Feb 2019
    Posts
    557
    Quote Originally Posted by awsdert View Post
    Said UTF-8 when I looked at it, both geany terminal and normal terminal give same result
    Probably this is your text file encoding. Are you trying to print this on Windows? On WINDOWS-1252 encoding this character isn't possible:

    The test.c (utf-8/unix) file:
    Code:
    char str[] = "⺢";
    Trying to convert to WINDOWS-1252 encoding...
    Code:
    $ iconv -f UTF-8 -t WINDOWS-1252 test.c -o test2.c
    iconv: illegal input sequence at position 14
    You'll need to call MultiByteToWideChar() API function to convert from UTF-8 to Wide String (UTF-16?):
    Code:
    /* test.c - utf-8 format */
    char str[] = "⺢";
    char buffer[16];
    
    /* See MultiByteToWideChar() on MSDN */
    if ( ! MultiByteToWideChar( CP_UTF8, 0, str, -1, buffer, sizeof buffer ) )
    { ... error handling ... }
    /* here buffer[] will hold the converted string */

  5. #5
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657
    On linux mint, I'll check the encoding when I get home, think it also utf8 though since I left it on default

  6. #6
    null pointer Structure's Avatar
    Join Date
    May 2019
    Posts
    148
    Any ideas
    wtf is %% ?


  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    27,271
    Quote Originally Posted by Structure View Post
    wtf is %% ?
    Doubling the % is the way to escape it.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657
    Well according to geany my file is UTF-8, I've also tried swapping out "⺢" for "\u2ea2" and still no luck

    Edit: Side note, all my display fonts are set to "Noto Sans <VARIANT> JP", all of which contain the mizu character
    Last edited by awsdert; 09-13-2019 at 03:51 PM.

  9. #9
    null pointer Structure's Avatar
    Join Date
    May 2019
    Posts
    148
    Doubling the % is the way to escape it.
    makes sense.


  10. #10
    Registered User
    Join Date
    Feb 2019
    Posts
    557
    Quote Originally Posted by awsdert View Post
    Well according to geany my file is UTF-8, I've also tried swapping out "⺢" for "\u2ea2" and still no luck
    Your terminal must be able to decode UTF-8 as well. Take a look at this file:
    Code:
    Test: 
    In Linux (usualy uses UTF-8) and Windows, this is encoded as:
    Code:
    $ hd test-linux.txt
    00000000  54 65 73 74 3a 20 c3 a1  c3 a3 c3 a7 c3 b4 0a     |Test: .........|
    0000000f
    
    $ hd test-windows.txt
    00000000  54 65 73 74 3a 20 e1 e3  e7 f4 0d 0a              |Test: ......|
    0000000c
    Notice the "special" chars, with accents, are encoded differently. "", in UTF-8, is "\xc3\xa1", but in WINDOWS-1252 (ISO-8859-1 or Latin-1) is "\xe1".
    If you try to print the test-linux.txt on Windows, will get "Test: ¡£§´". If you try to print test-windows.txt on linux (or any UTF-8 terminal) you'll get: "Test: ����".

    PS: By the way, did you ever notice some sites showing these black diamond question marks? Usually the HTML is condifured as "UTF-8" (using "<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />") but encoded in ISO-8859-1 (or WINDOWS-1252) -- edited, of course, on Windows!

    By default, on Linux systems, editors like VIM and Emacs, creates UTF-8 encoded files (the same encoding as the terminal):

    Struggling to print a wide character-untitled-png

    If you are using Windows and saving your file in UTF-8 your chars will be encoded in UTF-8, but this doen't mean your terminal is capable to display UTF-8 encoded chars, hence the MultiByteToWideChar() Windows API call I showed before....
    Last edited by flp1969; 09-13-2019 at 04:21 PM.

  11. #11
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657
    Quote Originally Posted by flp1969 View Post
    Your terminal must be able to decode UTF-8 as well. Take a look at this file:
    Code:
    Test: 
    In Linux (usualy uses UTF-8) and Windows, this is encoded as:
    Code:
    $ hd test-linux.txt
    00000000  54 65 73 74 3a 20 c3 a1  c3 a3 c3 a7 c3 b4 0a     |Test: .........|
    0000000f
    
    $ hd test-windows.txt
    00000000  54 65 73 74 3a 20 e1 e3  e7 f4 0d 0a              |Test: ......|
    0000000c
    Notice the "special" chars, with accents, are encoded differently. "", in UTF-8, is "\xc3\xa1", but in WINDOWS-1252 (ISO-8859-1 or Latin-1) is "\xe1".
    If you try to print the test-linux.txt on Windows, will get "Test: ¡£§´". If you try to print test-windows.txt on linux (or any UTF-8 terminal) you'll get: "Test: ����".

    PS: By the way, did you ever notice some sites showing these black diamond question marks? Usually the HTML is condifured as "UTF-8" (using "<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />") but encoded in ISO-8859-1 (or WINDOWS-1252) -- edited, of course, on Windows!

    By default, on Linux systems, editors like VIM and Emacs, creates UTF-8 encoded files (the same encoding as the terminal):

    Struggling to print a wide character-untitled-png

    If you are using Windows and saving your file in UTF-8 your chars will be encoded in UTF-8, but this doen't mean your terminal is capable to display UTF-8 encoded chars, hence the MultiByteToWideChar() Windows API call I showed before....
    Already said I'm using linux mint, why do you keep going back to windows stuff? I'll adapt to that after getting my code working on linux. Also I said before that my terminal reports UTF-8 as well

  12. #12
    Registered User
    Join Date
    Feb 2019
    Posts
    557
    OBS: There is no portable way to get the actual terminal encoding support. On Linux you can do:
    Code:
    $ locale charmap
    Or see one of the LC_ environment vars (or LANG)... You can try, on any terminal to use python:
    Code:
    $ python -c 'import sys; print(sys.stdout.encoding)'
    This should work for Windows as well...

    My tip is. Use the SAME encoding your terminal uses when compiling your code... But, as I showed before, "⺢" cannot be encoded in WINDOWS-1252 or ISO-8859-1.

    Ahhhh... Since Windows 7 there is a WINDOWS-65001 codepage (which is UTF-8). Maybe you can try to change to this one (if you are using Windows)...

  13. #13
    Registered User
    Join Date
    Feb 2019
    Posts
    557
    Quote Originally Posted by awsdert View Post
    Already said I'm using linux mint, why do you keep going back to windows stuff? I'll adapt to that after getting my code working on linux. Also I said before that my terminal reports UTF-8 as well
    Then, check your file with 'hd' and see if you get the sequence E2 BA A2 for '⺢'.... And, is this what you get?
    Code:
    $ echo -n '⺢' | hd
    00000000  e2 ba a2                                          |...|
    00000003
    If it is, then there is another explanation for wrong char being shown: You forgot to install the font which have the japanese glyphs.

    Here I have installed (and others, like egyptian font for hieroglyphs):
    Code:
    $ dpkg -l | grep Japanese
    ii  fonts-ipaexfont-gothic                     00301-4ubuntu1                                      all          Japanese OpenType font, IPAex Gothic Font
    ii  fonts-ipaexfont-mincho                     00301-4ubuntu1                                      all          Japanese OpenType font, IPAex Mincho Font
    ii  fonts-ipafont-gothic                       00303-18ubuntu1                                     all          Japanese OpenType font set, IPA Gothic and IPA P Gothic Fonts
    ii  fonts-ipafont-mincho                       00303-18ubuntu1                                     all          Japanese OpenType font set, IPA Mincho and IPA P Mincho Fonts
    ii  fonts-takao-pgothic                        00303.01-2ubuntu1                                   all          Japanese TrueType font set, Takao P Gothic Fonts
    Last edited by flp1969; 09-13-2019 at 04:46 PM.

  14. #14
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657
    Quote Originally Posted by flp1969 View Post
    Then, check your file with 'hd' and see if you get the sequence E2 BA A2 for '⺢'.... And, is this what you get?
    Code:
    $ echo -n '⺢' | hd
    00000000  e2 ba a2                                          |...|
    00000003
    If it is, then there is another explanation for wrong char being shown: You forgot to install the font which have the japanese glyphs.

    Here I have installed (and others, like egyptian font for hieroglyphs):
    Code:
    $ dpkg -l | grep Japanese
    ii  fonts-ipaexfont-gothic                     00301-4ubuntu1                                      all          Japanese OpenType font, IPAex Gothic Font
    ii  fonts-ipaexfont-mincho                     00301-4ubuntu1                                      all          Japanese OpenType font, IPAex Mincho Font
    ii  fonts-ipafont-gothic                       00303-18ubuntu1                                     all          Japanese OpenType font set, IPA Gothic and IPA P Gothic Fonts
    ii  fonts-ipafont-mincho                       00303-18ubuntu1                                     all          Japanese OpenType font set, IPA Mincho and IPA P Mincho Fonts
    ii  fonts-takao-pgothic                        00303.01-2ubuntu1                                   all          Japanese TrueType font set, Takao P Gothic Fonts
    Yep got same output as you gave and then ran 'hd char.c' for my file and found the exact same sequence in my file, why is it a different sequence to what character map application is showing?

    Edit: didn't notice your edit, as I noted before I have noto fonts being used and all the ones I've selected are specifically the japanese variants.

    Edit 2: Tried running that command you gave and got this:
    Code:
    dpkg -l | grep Japanese
    ii  fonts-takao-pgothic                        00303.01-2ubuntu1                   all          Japanese TrueType font set, Takao P Gothic Fonts
    Any ideas why noto fonts didn't show there despite having japanese supporting variants?
    Last edited by awsdert; 09-13-2019 at 04:59 PM.

  15. #15
    Registered User
    Join Date
    Feb 2019
    Posts
    557
    Quote Originally Posted by awsdert View Post
    Edit 2: Tried running that command you gave and got this:
    Code:
    dpkg -l | grep Japanese
    ii  fonts-takao-pgothic                        00303.01-2ubuntu1                   all          Japanese TrueType font set, Takao P Gothic Fonts
    Any ideas why noto fonts didn't show there despite having japanese supporting variants?
    Well... I have no idea what 'noto' is. I can only suggest installing the other fonts...

    I am brazillian and just think the ideograms are cute. I don't understand the language, awsdert-san!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. get wide character and multibyte character value
    By George2 in forum C++ Programming
    Replies: 27
    Last Post: 01-27-2008, 05:10 AM
  2. wide character (unicode) and multi-byte character
    By George2 in forum Windows Programming
    Replies: 6
    Last Post: 05-05-2007, 12:46 AM
  3. about wide character and multiple byte character
    By George2 in forum C Programming
    Replies: 3
    Last Post: 05-22-2006, 08:11 PM
  4. Where is my extended wide character ?
    By intmail in forum C Programming
    Replies: 4
    Last Post: 02-14-2006, 04:54 PM
  5. Wide Character Writing
    By pianorain in forum C++ Programming
    Replies: 9
    Last Post: 08-19-2005, 02:00 PM

Tags for this Thread