GCC not following C99+?

**flp1969** · 08-16-2020

Please, check my interpretation:

ISO 9989:1999 and later says in topic 5.2.1.2:

"An identifier, comment, string literal, character constant, or header name shall consist of a sequence of valid multibyte characters."

And Annex D (normative) lists the "valid multibyte characters"

This way, identifiers like "nº" or "eⁿ" are valid... and GCC says in its documentation of -fextended-identifiers option:

"Accept universal character names in identifiers. This option is enabled by default for C99 (and later C standard versions) and C++."

But such identifiers names aren't accepted! At least, I got a compilation error... This, doesn't compile:

Code:

int nº=3;

The only "extended" character accepted is "$", but it isn't normative (not in Annex D).

Any thoughts?

PS: By the way... clang and Visual C++ 19 (when source-code is encoded in UTF-16LE) follows annex D!

**flp1969** · 08-16-2020

Ahhhh... my locale setting:

Code:

$ set | grep ^LC
LC_ADDRESS=pt_BR.UTF-8
LC_CTYPE=pt_BR.UTF-8
LC_IDENTIFICATION=pt_BR.UTF-8
LC_MEASUREMENT=pt_BR.UTF-8
LC_MONETARY=pt_BR.UTF-8
LC_NAME=pt_BR.UTF-8
LC_NUMERIC=pt_BR.UTF-8
LC_PAPER=pt_BR.UTF-8
LC_TELEPHONE=pt_BR.UTF-8
LC_TIME=pt_BR.UTF-8

Even if a adjust LC_ALL to 'pt_BR.UTF-8', gcc still doesn't compile the "extended" named identifier.

**laserlight** · 08-16-2020

Are you compiling with gcc 10? From this bug report, it looks like you're 5 years (to the month, haha) too late to find the missing feature, and a fix was finally released as part of the gcc 10 release this year.

EDIT:
Yeah, you can compare the manual entry on character sets for gcc 9.3.0 vs gcc 10.2.0:

Originally Posted by gcc 9.3.0 manual

In identifiers, characters outside the ASCII range can only be specified with the ‘\u’ and ‘\U’ escapes, not used directly.

Originally Posted by gcc 10.2.0 manual

In identifiers, characters outside the ASCII range can be specified with the ‘\u’ and ‘\U’ escapes or used directly in the input encoding.

Also, I note that this explains what the "universal character names" option is about: it refers to the \u and \U escapes as applied to identifiers etc, not to the direct use of multibyte characters. This feature is mentioned by the same name in the C standard, and that's what Annex D is about, not "valid multibyte characters", as those are locale-specific according to 5.2.1.2. Hence, whether nº is a valid identifier depends on the locale, not Annex D, and since gcc 9 documented it as not supported, it is technically standard conforming (though we could argue that it is not in the spirit of the rule as it is more a technical limitation rather than one imposed by "local conventions").

**flp1969** · 08-17-2020

Thanks, laserlight.

I am using gcc 7.3 and 8.4 (available in Ubuntu repos). Good to know they finally corrected this.

[]s
Fred

Thread: GCC not following C99+?

Thread Tools

Search Thread

Display

GCC not following C99+?

Tags for this Thread