Thread: Unicode identifiers

  1. #1
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072

    Unicode identifiers

    To those of you who've tested the 2005 Express beta from microsoft:

    Does the latest C++ compiler from microsoft support unicode characters?

    I.e. will this code compile:
    Code:
    int main()
    {
      int åtta= 8;
    }
    I'm just curious. I cannot remember I've seen any compiler support this.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  2. #2
    Registered User
    Join Date
    Mar 2004
    Posts
    536
    Quote Originally Posted by Sang-drax
    To those of you who've tested the 2005 Express beta from microsoft:

    Does the latest C++ compiler from microsoft support unicode characters?

    I.e. will this code compile:
    Code:
    int main()
    {
      int åtta= 8;
    }
    I'm just curious. I cannot remember I've seen any compiler support this.
    THis works:

    Code:
    #include <stdio.h>
    
    int main()
    {
      int åtta= 8;
      char str[] = "åtta";
    
      printf("s: <%s>\n", str);
      printf("åtta = %d\n", åtta);
      return 0;
    }
    Output on Windows XP:

    s: <σtta>
    σtta = 8
    The compiler version is
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.40607.16 for 80x86
    Regards,

    Dave
    Last edited by Dave Evans; 12-09-2004 at 04:48 PM.

  3. #3
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Nice. Thank you.
    I'll download the beta now.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Why would you want such a thing?
    I doubt such code would be compilable on all C++ compilers

  5. #5
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    I don't know. Perhaps I can throw in some unicode characters when helping with the stupider kind of homework questions. When they complain I'll just reference to the standard.

    I noticed that my current compiler couldn't handle it and I just wanted to know if Microsofts new complier was any better.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Perhaps you're confusing the source character set (the one the program is written in) with the execution character set (the one the program is able to use to communicate with the outside world)

  7. #7
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    I did some testing with MSVC.NET a while back. If I remember correctly, it successfully compiled source code saved as UTF-8 or UTF-16LE. This was with the command line version, I'm not sure if the IDE offers the same support. I only tested with unicode characters in string literals, I can't vouch for unicode characters in variable names.

  8. #8
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Quote Originally Posted by Salem
    Perhaps you're confusing the source character set (the one the program is written in) with the execution character set (the one the program is able to use to communicate with the outside world)
    I don't have the standard here at this computer, so I cannot quote anything. I'm quite sure some unicode characters can be used in identifier names though.
    I'll have to get back to this when my Internet connection starts working. It has been down for days.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  9. #9
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Perhaps this is what you are looking for Sang-drax?
    Section 2.10.1
    1 An identifier is an arbitrarily long sequence of letters and digits. Each universal-character-name in an identifier shall designate a character whose encoding in ISO 10646 falls into one of the ranges specified in Annex E. Upper- and lower-case letters are different. All characters are significant.20)
    [...]
    20) On systems in which linkers cannot accept extended characters, an encoding of the universal-character-name may be used in forming valid external identifiers. For example, some otherwise unused character or sequence of characters may be used to encode the \u
    in a universal-character-name. Extended characters may produce a long external identifier, but C + + does not place a translation limit on significant characters for external identifiers. In C + +, upper- and lower-case letters are considered different for all identifiers, including
    external identifiers.

  10. #10
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Yup, Annex E at the end of the document contains the valid characters that can be used. Quite a few, actually. BTW, my connection is up again.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. <string> to LPCSTR? Also, character encoding: UNICODE vs ?
    By Kurisu33 in forum C++ Programming
    Replies: 7
    Last Post: 10-09-2006, 12:48 AM
  2. Unicode - a lot of confusion...
    By Jumper in forum Windows Programming
    Replies: 11
    Last Post: 07-05-2004, 07:59 AM
  3. Should I go to unicode?
    By nickname_changed in forum C++ Programming
    Replies: 10
    Last Post: 10-13-2003, 11:37 AM
  4. UNICODE and windows.h help
    By nextus in forum Windows Programming
    Replies: 3
    Last Post: 03-02-2003, 03:13 PM
  5. UNICODE and GET_STATE
    By Registered in forum C++ Programming
    Replies: 1
    Last Post: 07-15-2002, 03:23 PM