Idea for new literal

**awsdert** · 10-14-2019

So I had been thinking about how to give access to my ubase function in analysed text while still at the stage of literals and not looking for functions, this is what I came up with:
b/B#[#][l/L]"[+/- #]"[u/l/etc]
The b & B are interchangeable like you normally find with 0x# 0X# 0b# 0B#, the decimal number before the string is always going to be the base, the string will contain the actual number and the final bit after the number is the usual stuff you find after an integer literal, an example is b2"-1001" will result in -9. The keen eyed will notice the [l/L], that is an optional modifier that changes the boolean lowislow to true, for the unfamiliar it just means instead of treating uppercase letters as lower values and lowercase values as high values in a base > 36 scenario the reverse is applied, uppercase is treated as high values and lowercase treated as low values, when base <= 36 this is just ignored since there is no use for it. Another example is b62l"+z" will equal 35 (didn't do my math on that so might be 36) but b62"z" will equal 61 (likewise might be 62)

This is doable in my character by character analysis cause I've mostly implemented already, just got bugs here and there that I need to squash before creating an actual library for usage, what I'd like to know from this thread is if anyone sees any potential issues, for example b/B already having a meaning when prefixed to string literals, I never looked since I doubted it so I'm gonna work on those bugs and come back to see if anyone has replied with contrary information.

**laserlight** · 10-14-2019

C11 only specifies these encoding prefixes for string literals: u8, u, U, L. Hence, your b# and B# encoding prefixes would not conflict as a language extension.

The syntax specified by the standard does not provide for encoding suffixes, and you may have to consider that adjacent string literals are concatenated, so there would be conflict between your choice of L for a suffix and the standard use of L for a prefix. It may be simpler to just incorporate this into the prefix.

Another consideration is that you have what appears to be a string literal, but from 'an example is b2"-1001" will result in -9' it sounds like you have in mind integer literals. If so, this is likely to be a bad idea because it introduces inconsistency and hence possible confusion. On the other hand, if you introduce entirely different syntax to generalise integer constants, it would be yet another new syntax to learn and parse for, whereas this language extension only (?) requires understanding that the type is wildly different from what one would expect from a string literal.

Personally, I think reusing string literals for integer constants is the kind of thing that language designers will get mercilessly mocked for ten years down the road, but in the end its your choice since you're the one implementing the language extension. (And whos's to say if people will or will not like whatever alternative you might come up with? Programmers can be quite a critical bunch when it comes to their pet language peeves.)

**awsdert** · 10-14-2019

Originally Posted by laserlight

C11 only specifies these encoding prefixes for string literals: u8, u, U, L. Hence, your b# and B# encoding prefixes would not conflict as a language extension.

Thought so, just needed to be sure

Originally Posted by laserlight

The syntax specified by the standard does not provide for encoding suffixes, and you may have to consider that adjacent string literals are concatenated, so there would be conflict between your choice of L for a suffix and the standard use of L for a prefix. It may be simpler to just incorporate this into the prefix.

To begin with this is supposed a new literal standard so a non-conforming compiler should & would actually abort compile at this point and just chuck out errors instead so I'm not too worried about that, also since I'm writing my own precompiler for this anyways peops could just use that first then pass the result onto gcc if they wanted to, either way the compiler I'm trying to write will support it properly and treat any string literals using l or L as a suffix as invalid and abort compile leaving only the errors & warnings, I originally considered extending 0b & 0B to do something similar but then it occured to me that it could confuse decimal as binary and even confuse where the base ends and the number begins, a clear means of distinguishing the value is needed to understand which base to expect, string literals were my first thought because they be macro'd into a somewhat graceful fallback to a function call instead like so:

Code:

#ifdef __misty__
#define UBASE( BASE, VAL, SFX ) b##BASE##VAL##SFX
#else
#define UBASE( BASE, VAL, SFX ) ubasestr( BASE, VAL, #SFX )
#endif
// Terrible way to use this but fully possible
int main() { return UBASE(10,"0"); }

Originally Posted by laserlight

Another consideration is that you have what appears to be a string literal, but from 'an example is b2"-1001" will result in -9' it sounds like you have in mind integer literals. If so, this is likely to be a bad idea because it introduces inconsistency and hence possible confusion. On the other hand, if you introduce entirely different syntax to generalise integer constants, it would be yet another new syntax to learn and parse for, whereas this language extension only (?) requires understanding that the type is wildly different from what one would expect from a string literal.

Well the other possibility I can think of is using the ' character, compilers will definetly complain about the length and lining it up with 'names', also generally understood to equate to an integer, fully trivial to switch to in my code so not an issue there either.

Originally Posted by laserlight

Personally, I think reusing string literals for integer constants is the kind of thing that language designers will get mercilessly mocked for ten years down the road, but in the end its your choice since you're the one implementing the language extension. (And whos's to say if people will or will not like whatever alternative you might come up with? Programmers can be quite a critical bunch when it comes to their pet language peeves.)

They can mock me all they like, I at least tried to come up with something suitable and even consulted others for their ideas and/or concerns (this thread being only instance thus far), I'd like to see them do better.

**awsdert** · 10-14-2019

For everyone's reference I've finally squashed the bugs I could see on integer front, gonna finish with string handling next before moving onto floating point numbers which I forgot about, my current output:

Code:

make char.run (in directory: /run/media/zxuiji/ZXUIJI_1TB/github/mc)
cc -fPIC -Wall -Wno-multichar  -shared -o ./libnext.so -c next.c
cc -fPIC -Wall -Wno-multichar  -shared -o ./libtsc.so -c tsc.c
cc -fPIC -Wall -Wno-multichar  -shared -o ./libbase62.so -c base62.c
cc -fPIC -Wall -Wno-multichar  -D OUT=char.elf -o ./char.elf char.c ./libnext.so ./libtsc.so ./libbase62.so
./char.elf
char.c:692:literalc(): Error: 0x00000054, 84, Invalid or incomplete multibyte or wide character, Info:
char.c:692:literalc(): Error: 0x00000054, 84, Invalid or incomplete multibyte or wide character, Info:
std_encoding = 'UTF-8'
Compiler Tests:
'abcd': 0x61626364, 'd', dcba
'\u2ea2': 0x00E2BAA2, '\Uffffffff', \Uffffffff\Uffffffff\Uffffffffa\u2ea2': 0x61E2BAA2, '\Uffffffff', \Uffffffff\Uffffffff\Uffffffff'\u2ea2a': 0xE2BAA261, 'a', a\Uffffffff\Uffffffff\Uffffffffnit tests:
010 = 8         (signed)
0b101 = 5         (signed)
99999 = 99999         (signed)
0x5f5e0ff = 99999999         (signed)
-010 = -8         (signed)
-0b101 = -5         (signed)
-99999 = -99999         (signed)
-0x5f5e0ff = -99999999         (signed)
-1u = 4294967295u      (unsigned)
-1uhh = 255uhh    (unsigned)
-1ull = 18446744073709551615ull    (unsigned)
-1ui10 = 1023ui10 (unsigned)
-99 = -99         (signed)
-99hh = -99hh       (signed)
-99ll = -99ll       (signed)
-99i10 = -99ui10   (signed)
0xFFFFFFFFFFFFFFFF = -1         (signed)
0xFFFFFFFFFFFFFFFFh = -1h        (signed)
0xFFFFFFFFFFFFFFFFhh = -1hh       (signed)
0xFFFFFFFFFFFFFFFFl = -1l        (signed)
0xFFFFFFFFFFFFFFFFll = -1ll       (signed)
0xFFFFFFFFFFFFFFFFi9 = -1ui9   (signed)
0xFFFFFFFFFFFFFFFFu = 4294967295u      (unsigned)
0xFFFFFFFFFFFFFFFFuh = 65535uh     (unsigned)
0xFFFFFFFFFFFFFFFFuhh = 255uhh    (unsigned)
0xFFFFFFFFFFFFFFFFul = 18446744073709551615ul     (unsigned)
0xFFFFFFFFFFFFFFFFull = 18446744073709551615ull    (unsigned)
0xFFFFFFFFFFFFFFFFui9 = 511ui9 (unsigned)
b36"z" = 35         (signed)
b16"fff"hh = -1hh       (signed)
b16"fff"hH = -1hh       (signed)
\u2ea2 =
'abcd' =  61626364, 'dcba'
'\u2ea2' =  00E2BAA2, '\Uffffffff\Uffffffff\Uffffffff'a\u2ea2' =  61E2BAA2, '\Uffffffff\Uffffffff\Uffffffff
'\u2ea2a' =  E2BAA261, 'a\Uffffffff\Uffffffff\Uffffffff"\u2ea2" =  00A2BAE2, '\u2ea2'
"abcd\u2ea2" =  64636261, 'abcd\u2ea2'
"\u2ea2abcd" =  61A2BAE2, '\u2ea2abcd'
"abcd\u2ea2wxyz" =  64636261, 'abcd\u2ea2wxyz'
u8'abcd' =  61626364, 'dcba'
u8'\u2ea2' =  00E2BAA2, '\Uffffffff\Uffffffff\Uffffffffu8'a\u2ea2' =  61E2BAA2, '\Uffffffff\Uffffffff\Uffffffff
u8'\u2ea2a' =  E2BAA261, 'a\Uffffffff\Uffffffff\Uffffffffu8"\u2ea2" =  00A2BAE2, '\u2ea2'
u8"abcd\u2ea2" =  64636261, 'abcd\u2ea2'
u8"\u2ea2abcd" =  61A2BAE2, '\u2ea2abcd'
u8"abcd\u2ea2wxyz" =  64636261, 'abcd\u2ea2wxyz'
u'cd' =  00FFFE63, '
u'\u2ea2' =  0000A22E, '
u'a\u2ea2' =  2E6100A2, '
u'\u2ea2a' =  00A22E61, '
u"\u2ea2" =  00002EA2, '
Last character was '        '
Character hex: 00 00 00 00 00 00
rm libtsc.so libbase62.so char.elf libnext.so
char.c:692:literalc(): Error: 0x00000054, 84, Invalid or incomplete multibyte or wide character, Info:
char.c:692:literalc(): Error: 0x00000054, 84, Invalid or incomplete multibyte or wide character, Info:
make: *** [makefile:36: char.run] Error 1
Compilation failed.

**awsdert** · 10-16-2019

New idea, no using string either, definitely causes an error with cc (presumably redirecting to gcc) as I've just tested it:

Code:

0~62~z

Thread: Idea for new literal

Thread Tools

Search Thread

Display

Idea for new literal

Similar Threads

using variables in literal strings

Modifying a string literal

Literal UL

use of hex literal constant

String literal

Tags for this Thread