Thread: Non-English characters with cout

  1. #1
    Do you C what I C? jamesallen4u's Avatar
    Join Date
    Oct 2011
    Posts
    43

    Question Non-English characters with cout

    Hello,

    How would you do something like this using cout in C++?

    Code:
    cout << "привет мир";
    (The Russian Equivalent of "Hello World")

    I tried using copy-paste in VC++ but as I had expected the compiler gave me problems. I tried the same thing in another online IDE but it ran perfectly which confused me. (Ideone.com | Online C++ Compiler & Debugging Tool) Is there a procedure you can follow just as you would declare a char array for extended ASCII or is there a special include file which would allow me to use non-English characters? Thanks for your help.
    Linux Distro: Ubuntu 12.04
    Browser: Chromium

  2. #2
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    (AFAIK) Using wcout instead of cout should work.

  3. #3
    Do you C what I C? jamesallen4u's Avatar
    Join Date
    Oct 2011
    Posts
    43
    Thanks for the reply. I am completely unfamiliar with wcout and I Googled it, but I could not find any examples, can you please give me one?
    Linux Distro: Ubuntu 12.04
    Browser: Chromium

  4. #4
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    std::wcout << "привет мир";

    Anyway, stop using ASCII at this point. Start using Unicode. To do that, make sure to:

    - Use the "wide" versions of the library functions, such as std::wcout instead of std::cout.
    - Save the file in a unicode format.
    - Make sure your compiler can understand and read unicode format files.
    - Make sure you are using a font in your console that supports the characters you are trying to write.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  5. #5
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    (I thought it would be simple, but am embarrassed to say that I couldn't write a working solution (in my language ))
    Strangely just using std::cout works (I don't know why...(maybe the shell detects non-ascii automatically based on locale ?)):
    Code:
    cout<<"মনসিজ";// Seems to work for me.
    //wcout produces ?????
    But any attempt to use wcout (or anything associated with C++11 unicode simply produces garbage output, a memory address..or no output at all).
    Can any one of you produce a simple example that works for you, so I can determine whether my knowledge is broken, or my system ?
    (I already tried the ones on Wikipedia and couldn't do any I/O with them)
    Last edited by manasij7479; 01-29-2012 at 02:14 PM.

  6. #6
    Do you C what I C? jamesallen4u's Avatar
    Join Date
    Oct 2011
    Posts
    43
    Thanks guys for your replies, I tried using wcout and saving the file in Visual Studio as a Unicode but it still outputs
    ?????? ???
    . A plain cout did not work either.
    Linux Distro: Ubuntu 12.04
    Browser: Chromium

  7. #7
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by jamesallen4u View Post
    Thanks guys for your replies, I tried using wcout and saving the file in Visual Studio as a Unicode but it still outputs . A plain cout did not work either.
    Same happens for me with wcout. (Even after meeting the conditions given by Elysia on #4)
    Seems that you have to dabble in some Unicode the hard way.

  8. #8
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    I am no Unicode expert, and the C++03 standard is broken after some Googling. I am not sure on if it's solved in C++11, though.
    Anyway, it's more complicated that I initially thought. You have to somehow change your locale because the standard mandates that Unicode will be converted to narrow characters before printed (then what the hell is the point of unicode!?!?!?!). The how, I do not know.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  9. #9
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    You can hardcode your strings with escaped characters but it won't be legible.

  10. #10
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by whiteflags View Post
    You can hardcode your strings with escaped characters but it won't be legible.
    I tried that too... but got a memory address(or something that look like one) as the output.

  11. #11
    the hat of redundancy hat nvoigt's Avatar
    Join Date
    Aug 2001
    Location
    Hannover, Germany
    Posts
    3,130
    Can anybody post some code? UNICODE is tricky to get right. You need wcout, you most likely need a unicode string (one prefixed with L) your source file needs to be saved as unicode and your codepage in the console needs to support the characters you are trying to print. Elysia already pointed out most of it, but without code it's really hard to guess what little detail might have been overlooked.
    hth
    -nv

    She was so Blonde, she spent 20 minutes looking at the orange juice can because it said "Concentrate."

    When in doubt, read the FAQ.
    Then ask a smart question.

  12. #12
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by nvoigt View Post
    Can anybody post some code? UNICODE is tricky to get right. You need wcout, you most likely need a unicode string (one prefixed with L) your source file needs to be saved as unicode and your codepage in the console needs to support the characters you are trying to print. Elysia already pointed out most of it, but without code it's really hard to guess what little detail might have been overlooked.
    Here are the permutations that compiled. (The ones I least expected to work, worked correctly.)
    Probably not fault of terminal, as the string can be typed there.
    Code:
    #include<iostream>
    using namespace std;
    int main()
    {
        //When file is saved as utf-8 :
        //cout<<"মনসিজ"; //Expected Output (Why ?)
        //wcout<<"মনসিজ"; //NO output
        //wcout<<u8"মনসিজ"; //NO output
        //wcout<<L"মনসিজ"; // Output: "?????"
        //wcout<<L"\x0987";//Output : "?" (Trying a single char)
        ////cout<<L"মনসিজ"; //Output: 0x80486a8
        //cout<<u8"This is a Unicode Character: \u0987.";
            //(Wikipedia example): Worked (Why ?)
    
    
    
    
    
        //When file is saved as utf-16 :
            ///FAILS TO BUILD
            //with errors like :
            //a.cpp:1:2: warning: null character(s) ignored [enabled by default]
            //a.cpp:1:3: error: invalid preprocessing directive #i
    
    
    }
    Strangely, cout seems to work with C++11 unicode ..or plain strings containing unicode chars.
    I thought stream support for unicode was postponed in C++11 as it was not ready. (Is this gcc specific ?)

    Another thing I tried, which did not work(Probably the reason for it being absent in C++11).
    file_1 and file_2 are utf-16 files
    Code:
        basic_ifstream<char16_t> ifs("file_1");
        basic_ofstream<char16_t> ofs("file_2");
        basic_string<char16_t> temp;
    
    
        if(ifs && ofs)
        {
            //ofs<<ifs.rdbuf();
                //Gives a Didn't Work #2 (as below) and the output file remains empty.
                //This works with normal streams for copying files.
    
    
            while(getline(ifs,temp)) //
                ofs<<temp;
                //terminate called after throwing an instance of 'std::bad_cast'
                //what():  std::bad_cast
    
    
        }
        else
            cout<<"Didn't work #1"<<endl;
    
    
        if(!(ifs.good()&&ofs.good()))
            cout<<"Didn't work #2"<<endl;

  13. #13

  14. #14
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    I think there is come confusion, about the purpose of wide character types in C++; they only serve a purpose for converting non-native strings, and literals are in the native encoding. Thus, the whole "unicode conversion" issue here is a red herring, if the object is simply to use non-ascii literals in the source, because:

    1) Your system is probably not UTF-16, so will not display UTF-16 output properly. Wstring types generally default to UTF-16.

    2) If you are working on a file in an editor/IDE, that uses the system default, which is not UTF-16. This is true regardless of how you save the file.

    3) Saving the source file as UTF-16 does not mean the executable output will be in UTF-16. Saving the file in UTF-16 is a meaningless operation because the compiler uses only one native encoding; source files in other formats are translated. Eg, with gcc you would use -finput-charset to tell it your source is UTF-16. Then it converts to whatever its native set is.

    4) If the executable output is UTF-16, it will not display properly on your (non UTF-16) system. Ie, wcout is useless on a non UTF-16 system, period. It may work with ascii characters (when the executable output is the same as the system's), because converted to UTF-16 those are just ascii values alternating with 0x00, which is nothing when printed.

    So:

    Code:
    #include <iostream>
    #include <string>
    
    using namespace std;
    
    int main (void) {
    	string russian("привет мир");
    	string bengali("মনসিজ ");
    
    	cout << russian << endl;  // ok
    	wcout << "привет мир" << endl;
    	wcout << U"привет мир" << endl;
    	wcout << "okay" << endl; // ok
    	cout << U"okay" << endl;
    	cout << bengali << endl; //ok
    	return 0;
    }
    Created on a UTF8 system.

    g++ -Wall -std=c++0x -fexec-charset=UTF-16 test.cpp

    Output is all gibberish.

    g++ -Wall -std=c++0x -fexec-charset=UTF-8 test.cpp

    The lines marked "ok" work. Why?

    Code:
    	wcout << "привет мир" << endl;  
           	wcout << U"привет мир" << endl;
    Doesn't work because that is non-ascii UTF-16 output on a UTF-8 system.

    Code:
    	cout << U"okay" << endl;
    Doesn't work because "okay" is not actually a UTF-16 string.

    Ie, the reason the literal ones work:

    Code:
    	string russian("привет мир");
    	cout << russian << endl;
    Is because the editor, the console, and the compiler are all using the same encoding.

    The wcout "okay" works because of what I said in #4 above.

    So the simplest thing, portability wise, is to have the source available in different encodings, or else compile the executable to use different encodings.
    Last edited by MK27; 01-30-2012 at 10:01 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  15. #15
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Thanks, that cleared up some confusions.(without delving into the low levels)

    So, as long as everything is..say.. UTF-8 (as it is in my case), can I make a program language independent simple by maintaining a resource 'dictionary' for all the literals being used ?

    But I can't understand where the encoding of the compiled executable factors into this.
    Why does it matter when the other encoding is simply another data type ?
    For example, shouldn't I be able to do file IO with

    "basic_ifstream<char16_t>" , "basic_ofstream<char16_t>" , etc. irrespective of how the executable was made ?
    Last edited by manasij7479; 01-30-2012 at 10:04 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. fread + non-english characters
    By Elysia in forum C++ Programming
    Replies: 25
    Last Post: 04-20-2010, 01:43 PM
  2. Implementing a English-Spanish/Spanish-English Dictionary
    By invertedMirrors in forum C Programming
    Replies: 4
    Last Post: 02-23-2008, 03:48 PM
  3. std::cout or using namespace std or using std::cout
    By ComDriver in forum C++ Programming
    Replies: 13
    Last Post: 01-31-2005, 11:54 AM
  4. Whats the difference between cout and std::cout?
    By mdshort in forum C++ Programming
    Replies: 10
    Last Post: 12-30-2003, 05:34 PM
  5. Replies: 4
    Last Post: 06-22-2002, 01:00 PM