Thread: w_char/arrays/chinese

  1. #1
    Registered User
    Join Date
    Mar 2003
    Posts
    30

    Question w_char/arrays/chinese

    i challenge/implore anyone to direct me to a tutorial that explains the use of w_char clearly.

    Also, is it humanly possible to use C++ and w_char to fill an array with Chinese simplified characters and manipulate them and spit them out the way you want? I got it to work with some of the characters but it wouldn't take all of them...now where'd i put the code!!?

    Stink, it's on my Chinese 98 partition...i'll have to come back and post the code l8r.

  2. #2
    Pursuing knowledge confuted's Avatar
    Join Date
    Jun 2002
    Posts
    1,916
    I would give you a link to www.google.com, but I see that you're in China. Anyway, I believe w_char is MS only, not ANSI... but don't you just use it like a normal char?
    Away.

  3. #3
    Just because ygfperson's Avatar
    Join Date
    Jan 2002
    Posts
    2,490
    Originally posted by blackrat364
    I would give you a link to www.google.com, but I see that you're in China. Anyway, I believe w_char is MS only, not ANSI... but don't you just use it like a normal char?
    I think w_char is a more general thing, portable across compilers.

    http://www.cslab.vt.edu/manuals/glib...e/libc_67.html

  4. #4
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    wchar_t is standard, for Unicode. I've used it before with Japanese characters.

    Wide characters are pretty much exactly like their non-wide counterparts. E.g.:

    wchar_t buffer[] = L"This is a Unicode string";
    wcout << buffer;

    If you're programming for Windows, it has a very strong capability to easily handle Unicode; check out TCHAR.H for a ton of functionality.

    If you're writing windows programs, I recommend using a string table and the related commands to manipulate Unicode strings; string tables make portability very, very easy.

    Of course, the WinAPI is not platform independant nor ANSI standard. wchar_t and std::wstring are completely ANSI standard, though, so it should be quite simple to write in Unicode. In fact, I compile nothing BUT Unicode programs, although I always write source to be compilable in either ANSI or Unicode.

    If you can be more specific about what you're looking to see, I can post example code. I am not at home for a few more days, but I can post some examples then.

  5. #5
    Registered User
    Join Date
    Mar 2003
    Posts
    30

    Post more details

    OK, i was trying to load 188 character radicals (each Chinese character has a base part called a radical) into an array, line by line. There are 12 sets of radicals, 1 to 12 according to how many strokes are in the radical. I was thinking maybe it would be better to just have 12 text files and only load the set of radicals that you want to use at runtime instead of loading all 12 sets. I did get it to load all 12 sets and it worked pretty well except for the 3 stroke radical set which had a few characters about three quarters of the way through display as chinese characters that didn't belong there, as blank spaces, or as some funky ascii characters and then the last few characters in the radical set displayed correctly on the end.

    Anyway, I've tried just loading one set at a time and it works for the smaller sets but not the bigger sets. For the bigger sets it displays most of them along with some empty space and a funky ascii char or 2 (in Chinese DOS). I'm guessing that each chinese character takes up 2 bytes instead of 1 for an ascii char so I'm going to have to figure out how to load an array (or vector, once i figure how to use 'em) with 2 byte characters and be able to sort, display, etc. I'm thinking that wchar_t is my best option, along with trying to write a windows/Unicode program that handles the Chinese characters easily. I've also read that trying to do this in my OS (98se) is not as easy as with Win2k and NT. I've been reading about Unicode, writing Windows programs in C++ (thought there's not a whole lot of clear guidance on that one, esp. if you're using dev-cpp), and trying to find tutorials on wchar_t.

    Some people are telling me that I should just write this program in Java or VB with unicode but I've been studying C++ and I really don't want to study more than one language at a time or skip completely to another language. Besides, I really like C++. I just like the style of C++ and I'm starting to comprehend most of the basics.

    Thanks a lot for any advice you can offer.

    Code:
    //FeedRadArray.cpp     
    //it compiles and works!!!  Woohoo!!!
    //the cin.getline() function with 3 parameters
    
    //works beautiful except for a few of the characters in the
    //3 stroke radical category
    
    #include <iostream>
    #include <cstdlib>
    #include <fstream>
    
    using namespace std;
    
    int main()
    { char arry[12][120]; 
      int count = 0;
      
      cout << "Loading radicals3.txt into the array...\n" << endl;
      
      ifstream infile("radicals3.txt");
      
      for(int a=0; a<13; a++)
      { infile.getline(arry[a], 120, '\n');
      }  
      
      for(int b=0; b<6; b++)
      { cout << "\narry [" << b << "] " << arry[b] << endl;
      }  
        
      system("pause");                  //so i can see it all on the DOS screen b4 it disappears
      
      for(int b=7; b<12; b++)
      { cout << "\narry [" << b << "] " << arry[b] << endl;
      }  
        
      system("pause");
      return 0;
    }
    Last edited by Swaine777; 06-25-2003 at 01:17 AM.

  6. #6
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    Hmm, 98se will be trickier. The WinNT operating systems (NT, 2K, XP, Longhorn) use Unicode for foreign characters. The 16-bit compatible operating systems (Win95, 98, Me) use what are called code pages. Under Unicode, things are nice. All characters are 16 bit, and all characters are always available. Using code pages, only certain characters are legal at any one time (e.g., using a Chinese code page you won't have Korean, under Hebrew you might not have Greek, etc), and characters can take a variable number of bits. Some characters will be one byte, some two, etc.

    You DON'T typically use wchar_t with multibyte character sets (code pages); a single character is NOT always the same length. You typically use an unsigned char, keeping in mind that one actual character from input or from the screen may be more than one character in your array.

    I never really did much with internationalization pre-Unicode; I've pretty much used Unicode only. You should be able to get Microsoft Layer for Unicode, which allows Win95, 98, Me to run Unicode programs. Then you can hopefully compile and run as pure Unicode.
    Last edited by Cat; 06-25-2003 at 08:53 AM.

  7. #7
    Registered User
    Join Date
    Jan 2003
    Posts
    648
    This should work just fine:
    format <original - replace with>

    char - wchar_t
    ifstream - wifstream
    cout - wcout
    "..." - L"..."
    'x' - L'x'
    system("pause") - _wsystem(L"pause")

  8. #8
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    Originally posted by Speedy5
    This should work just fine:
    format <original - replace with>

    char - wchar_t
    ifstream - wifstream
    cout - wcout
    "..." - L"..."
    'x' - L'x'
    system("pause") - _wsystem(L"pause")
    That would be great, IF he was using Unicode. Read his last post -- he's in 98se, so he's programming using a MBCS, not Unicode, and there's no guarantee that each character is 16 bits (in fact, many will probably not be).

Popular pages Recent additions subscribe to a feed