Thread: International Phonetic ALphabet in C

  1. #16
    Registered User
    Join Date
    Aug 2010
    Posts
    27
    wondering if i should install linux as my operating system since flp1969 said that the code that he suggests works in linux.
    does anyone have any thoughts on that?

  2. #17
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Quote Originally Posted by rivkyfried1 View Post
    wondering if i should install linux as my operating system since flp1969 said that the code that he suggests works in linux.
    does anyone have any thoughts on that?
    Windows is a weird environment... As I said, use Console API functions to print on Windows Console: WriteConsoleW works fine:
    Code:
    #include <windows.h>
    
    int main( void )
    {
      HANDLE hc;
      WCHAR msg[] = L"\u014b\n";
    
      hc = GetStdHandle( STD_OUTPUT_HANDLE );
      WriteConsoleW( hc, msg, ( sizeof msg - 1 ) / sizeof msg[0], NULL, 0 );
    }
    Linux/UNIXes tend to use UTF-8 as default charset on terminal, nowadays. Windows Concole use WINDOWS-1252 (singlebyte charset -- that's why you probably get K as output from wprintf on Windows Console, because U+014B (ŋ) is encoded as 0x014b (or 0x4b, 0x01 and 0x4b is 'K').
    Last edited by flp1969; 09-23-2022 at 09:16 AM.

  3. #18
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    wchar_t should be a 32 bits type, but compilers FOR WINDOWS tends to encode them as 16 bits. The whole idea of wide chars is to use an intermediary universal encoding to translate higher codepoints to a desired charset... Since UNICODE uses a 32 bits encoding (and uses some transformations: UTF-8, UTF-16...), it makes sense wchar_t should be a simple encoding (1 char = 1 codepoint).

    I have some problems using wchar.h functions on Windows as well...

    it is my opinion WINDOWS and MSVC are ........ (censored: something you flush down in a toilet) and should be avoided at all costs.

  4. #19
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    948
    Quote Originally Posted by flp1969 View Post
    wchar_t should be a 32 bits type, but compilers FOR WINDOWS tends to encode them as 16 bits. The whole idea of wide chars is to use an intermediary universal encoding to translate higher codepoints to a desired charset... Since UNICODE uses a 32 bits encoding (and uses some transformations: UTF-8, UTF-16...), it makes sense wchar_t should be a simple encoding (1 char = 1 codepoint).

    I have some problems using wchar.h functions on Windows as well...

    it is my opinion WINDOWS and MSVC are ........ (censored: something you flush down in a toilet) and should be avoided at all costs.
    I've had to deal briefly with WCHAR and TCHAR strings in Windows. I imagine I'd develop PTSD if I had to do that on a regular basis.

  5. #20
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Quote Originally Posted by christop View Post
    I've had to deal briefly with WCHAR and TCHAR strings in Windows. I imagine I'd develop PTSD if I had to do that on a regular basis.
    Well... I believe I would have some kind of brain injury if I had to deal with WINOWS in a regular basis... Take a look:
    Code:
    // test.c
    #include <windows.h>
    
    int main( void )
    {
      HANDLE hc;
      static const WCHAR msg1[] = L"\u20ac\n";  // EURO (U+20AC) -- ok!
      static const WCHAR msg2[] = L"\U0001d11e\n"; // SOL CLEEF (U+1D11E) -- doesn't work!
    
      hc = GetStdHandle( STD_OUTPUT_HANDLE );
      WriteConsoleW( hc, msg1, ( sizeof msg1 - 1 ) / sizeof msg1[0], NULL, 0 );
      WriteConsoleW( hc, msg2, ( sizeof msg2 - 1 ) / sizeof msg2[0], NULL, 0 );
    }
    Compiling with MSVC (to not have any doubt about this):

    International Phonetic ALphabet in C-untitled-png

    Seems WINDOWS don't deal very well with surrogates.
    Last edited by flp1969; 09-23-2022 at 03:02 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Hashing, indexing and phonetic normalization for approximate str matching...
    By biterman in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 11-21-2006, 09:42 AM
  2. International Limits on Precision
    By Davros in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 08-20-2004, 06:32 AM
  3. International Compliance, ugh!
    By Sebastiani in forum Windows Programming
    Replies: 1
    Last Post: 12-04-2002, 06:18 PM
  4. International (scandinavian) characters?
    By Unregistered in forum C++ Programming
    Replies: 3
    Last Post: 01-11-2002, 06:14 PM

Tags for this Thread