Thread: GCC not following C99+?

  1. #1
    Registered User
    Join Date
    Feb 2019

    GCC not following C99+?

    Please, check my interpretation:

    ISO 9989:1999 and later says in topic

    "An identifier, comment, string literal, character constant, or header name shall consist of a sequence of valid multibyte characters."

    And Annex D (normative) lists the "valid multibyte characters"

    This way, identifiers like "nš" or "eⁿ" are valid... and GCC says in its documentation of -fextended-identifiers option:

    "Accept universal character names in identifiers. This option is enabled by default for C99 (and later C standard versions) and C++."

    But such identifiers names aren't accepted! At least, I got a compilation error... This, doesn't compile:

    int nš=3;
    The only "extended" character accepted is "$", but it isn't normative (not in Annex D).

    Any thoughts?

    PS: By the way... clang and Visual C++ 19 (when source-code is encoded in UTF-16LE) follows annex D!
    Last edited by flp1969; 08-16-2020 at 04:44 AM.

  2. #2
    Registered User
    Join Date
    Feb 2019
    Ahhhh... my locale setting:
    $ set | grep ^LC
    Even if a adjust LC_ALL to 'pt_BR.UTF-8', gcc still doesn't compile the "extended" named identifier.

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Are you compiling with gcc 10? From this bug report, it looks like you're 5 years (to the month, haha) too late to find the missing feature, and a fix was finally released as part of the gcc 10 release this year.

    Yeah, you can compare the manual entry on character sets for gcc 9.3.0 vs gcc 10.2.0:
    Quote Originally Posted by gcc 9.3.0 manual
    In identifiers, characters outside the ASCII range can only be specified with the ‘\u’ and ‘\U’ escapes, not used directly.
    Quote Originally Posted by gcc 10.2.0 manual
    In identifiers, characters outside the ASCII range can be specified with the ‘\u’ and ‘\U’ escapes or used directly in the input encoding.
    Also, I note that this explains what the "universal character names" option is about: it refers to the \u and \U escapes as applied to identifiers etc, not to the direct use of multibyte characters. This feature is mentioned by the same name in the C standard, and that's what Annex D is about, not "valid multibyte characters", as those are locale-specific according to Hence, whether nš is a valid identifier depends on the locale, not Annex D, and since gcc 9 documented it as not supported, it is technically standard conforming (though we could argue that it is not in the spirit of the rule as it is more a technical limitation rather than one imposed by "local conventions").
    Last edited by laserlight; 08-16-2020 at 04:34 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User
    Join Date
    Feb 2019
    Thanks, laserlight.

    I am using gcc 7.3 and 8.4 (available in Ubuntu repos). Good to know they finally corrected this.


Popular pages Recent additions subscribe to a feed

Tags for this Thread