Thread: Idea for new literal

  1. #1
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733

    Idea for new literal

    So I had been thinking about how to give access to my ubase function in analysed text while still at the stage of literals and not looking for functions, this is what I came up with:
    b/B#[#][l/L]"[+/- #]"[u/l/etc]
    The b & B are interchangeable like you normally find with 0x# 0X# 0b# 0B#, the decimal number before the string is always going to be the base, the string will contain the actual number and the final bit after the number is the usual stuff you find after an integer literal, an example is b2"-1001" will result in -9. The keen eyed will notice the [l/L], that is an optional modifier that changes the boolean lowislow to true, for the unfamiliar it just means instead of treating uppercase letters as lower values and lowercase values as high values in a base > 36 scenario the reverse is applied, uppercase is treated as high values and lowercase treated as low values, when base <= 36 this is just ignored since there is no use for it. Another example is b62l"+z" will equal 35 (didn't do my math on that so might be 36) but b62"z" will equal 61 (likewise might be 62)

    This is doable in my character by character analysis cause I've mostly implemented already, just got bugs here and there that I need to squash before creating an actual library for usage, what I'd like to know from this thread is if anyone sees any potential issues, for example b/B already having a meaning when prefixed to string literals, I never looked since I doubted it so I'm gonna work on those bugs and come back to see if anyone has replied with contrary information.

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    C11 only specifies these encoding prefixes for string literals: u8, u, U, L. Hence, your b# and B# encoding prefixes would not conflict as a language extension.

    The syntax specified by the standard does not provide for encoding suffixes, and you may have to consider that adjacent string literals are concatenated, so there would be conflict between your choice of L for a suffix and the standard use of L for a prefix. It may be simpler to just incorporate this into the prefix.

    Another consideration is that you have what appears to be a string literal, but from 'an example is b2"-1001" will result in -9' it sounds like you have in mind integer literals. If so, this is likely to be a bad idea because it introduces inconsistency and hence possible confusion. On the other hand, if you introduce entirely different syntax to generalise integer constants, it would be yet another new syntax to learn and parse for, whereas this language extension only (?) requires understanding that the type is wildly different from what one would expect from a string literal.

    Personally, I think reusing string literals for integer constants is the kind of thing that language designers will get mercilessly mocked for ten years down the road, but in the end its your choice since you're the one implementing the language extension. (And whos's to say if people will or will not like whatever alternative you might come up with? Programmers can be quite a critical bunch when it comes to their pet language peeves.)
    Last edited by laserlight; 10-14-2019 at 04:32 AM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by laserlight View Post
    C11 only specifies these encoding prefixes for string literals: u8, u, U, L. Hence, your b# and B# encoding prefixes would not conflict as a language extension.
    Thought so, just needed to be sure
    Quote Originally Posted by laserlight View Post
    The syntax specified by the standard does not provide for encoding suffixes, and you may have to consider that adjacent string literals are concatenated, so there would be conflict between your choice of L for a suffix and the standard use of L for a prefix. It may be simpler to just incorporate this into the prefix.
    To begin with this is supposed a new literal standard so a non-conforming compiler should & would actually abort compile at this point and just chuck out errors instead so I'm not too worried about that, also since I'm writing my own precompiler for this anyways peops could just use that first then pass the result onto gcc if they wanted to, either way the compiler I'm trying to write will support it properly and treat any string literals using l or L as a suffix as invalid and abort compile leaving only the errors & warnings, I originally considered extending 0b & 0B to do something similar but then it occured to me that it could confuse decimal as binary and even confuse where the base ends and the number begins, a clear means of distinguishing the value is needed to understand which base to expect, string literals were my first thought because they be macro'd into a somewhat graceful fallback to a function call instead like so:
    Code:
    #ifdef __misty__
    #define UBASE( BASE, VAL, SFX ) b##BASE##VAL##SFX
    #else
    #define UBASE( BASE, VAL, SFX ) ubasestr( BASE, VAL, #SFX )
    #endif
    // Terrible way to use this but fully possible
    int main() { return UBASE(10,"0"); }
    Quote Originally Posted by laserlight View Post
    Another consideration is that you have what appears to be a string literal, but from 'an example is b2"-1001" will result in -9' it sounds like you have in mind integer literals. If so, this is likely to be a bad idea because it introduces inconsistency and hence possible confusion. On the other hand, if you introduce entirely different syntax to generalise integer constants, it would be yet another new syntax to learn and parse for, whereas this language extension only (?) requires understanding that the type is wildly different from what one would expect from a string literal.
    Well the other possibility I can think of is using the ' character, compilers will definetly complain about the length and lining it up with 'names', also generally understood to equate to an integer, fully trivial to switch to in my code so not an issue there either.
    Quote Originally Posted by laserlight View Post
    Personally, I think reusing string literals for integer constants is the kind of thing that language designers will get mercilessly mocked for ten years down the road, but in the end its your choice since you're the one implementing the language extension. (And whos's to say if people will or will not like whatever alternative you might come up with? Programmers can be quite a critical bunch when it comes to their pet language peeves.)
    They can mock me all they like, I at least tried to come up with something suitable and even consulted others for their ideas and/or concerns (this thread being only instance thus far), I'd like to see them do better.
    Last edited by awsdert; 10-14-2019 at 05:07 AM. Reason: Forgot the return in my code example

  4. #4
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    For everyone's reference I've finally squashed the bugs I could see on integer front, gonna finish with string handling next before moving onto floating point numbers which I forgot about, my current output:
    Code:
    make char.run (in directory: /run/media/zxuiji/ZXUIJI_1TB/github/mc)
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libnext.so -c next.c
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libtsc.so -c tsc.c
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libbase62.so -c base62.c
    cc -fPIC -Wall -Wno-multichar  -D OUT=char.elf -o ./char.elf char.c ./libnext.so ./libtsc.so ./libbase62.so
    ./char.elf
    char.c:692:literalc(): Error: 0x00000054, 84, Invalid or incomplete multibyte or wide character, Info:
    char.c:692:literalc(): Error: 0x00000054, 84, Invalid or incomplete multibyte or wide character, Info:
    std_encoding = 'UTF-8'
    Compiler Tests:
    'abcd': 0x61626364, 'd', dcba
    '\u2ea2': 0x00E2BAA2, '\Uffffffff', \Uffffffff\Uffffffff\Uffffffffa\u2ea2': 0x61E2BAA2, '\Uffffffff', \Uffffffff\Uffffffff\Uffffffff'\u2ea2a': 0xE2BAA261, 'a', a\Uffffffff\Uffffffff\Uffffffffnit tests:
    010 = 8         (signed)
    0b101 = 5         (signed)
    99999 = 99999         (signed)
    0x5f5e0ff = 99999999         (signed)
    -010 = -8         (signed)
    -0b101 = -5         (signed)
    -99999 = -99999         (signed)
    -0x5f5e0ff = -99999999         (signed)
    -1u = 4294967295u      (unsigned)
    -1uhh = 255uhh    (unsigned)
    -1ull = 18446744073709551615ull    (unsigned)
    -1ui10 = 1023ui10 (unsigned)
    -99 = -99         (signed)
    -99hh = -99hh       (signed)
    -99ll = -99ll       (signed)
    -99i10 = -99ui10   (signed)
    0xFFFFFFFFFFFFFFFF = -1         (signed)
    0xFFFFFFFFFFFFFFFFh = -1h        (signed)
    0xFFFFFFFFFFFFFFFFhh = -1hh       (signed)
    0xFFFFFFFFFFFFFFFFl = -1l        (signed)
    0xFFFFFFFFFFFFFFFFll = -1ll       (signed)
    0xFFFFFFFFFFFFFFFFi9 = -1ui9   (signed)
    0xFFFFFFFFFFFFFFFFu = 4294967295u      (unsigned)
    0xFFFFFFFFFFFFFFFFuh = 65535uh     (unsigned)
    0xFFFFFFFFFFFFFFFFuhh = 255uhh    (unsigned)
    0xFFFFFFFFFFFFFFFFul = 18446744073709551615ul     (unsigned)
    0xFFFFFFFFFFFFFFFFull = 18446744073709551615ull    (unsigned)
    0xFFFFFFFFFFFFFFFFui9 = 511ui9 (unsigned)
    b36"z" = 35         (signed)
    b16"fff"hh = -1hh       (signed)
    b16"fff"hH = -1hh       (signed)
    \u2ea2 =
    'abcd' =  61626364, 'dcba'
    '\u2ea2' =  00E2BAA2, '\Uffffffff\Uffffffff\Uffffffff'a\u2ea2' =  61E2BAA2, '\Uffffffff\Uffffffff\Uffffffff
    '\u2ea2a' =  E2BAA261, 'a\Uffffffff\Uffffffff\Uffffffff"\u2ea2" =  00A2BAE2, '\u2ea2'
    "abcd\u2ea2" =  64636261, 'abcd\u2ea2'
    "\u2ea2abcd" =  61A2BAE2, '\u2ea2abcd'
    "abcd\u2ea2wxyz" =  64636261, 'abcd\u2ea2wxyz'
    u8'abcd' =  61626364, 'dcba'
    u8'\u2ea2' =  00E2BAA2, '\Uffffffff\Uffffffff\Uffffffffu8'a\u2ea2' =  61E2BAA2, '\Uffffffff\Uffffffff\Uffffffff
    u8'\u2ea2a' =  E2BAA261, 'a\Uffffffff\Uffffffff\Uffffffffu8"\u2ea2" =  00A2BAE2, '\u2ea2'
    u8"abcd\u2ea2" =  64636261, 'abcd\u2ea2'
    u8"\u2ea2abcd" =  61A2BAE2, '\u2ea2abcd'
    u8"abcd\u2ea2wxyz" =  64636261, 'abcd\u2ea2wxyz'
    u'cd' =  00FFFE63, '
    u'\u2ea2' =  0000A22E, '
    u'a\u2ea2' =  2E6100A2, '
    u'\u2ea2a' =  00A22E61, '
    u"\u2ea2" =  00002EA2, '
    Last character was '        '
    Character hex: 00 00 00 00 00 00
    rm libtsc.so libbase62.so char.elf libnext.so
    char.c:692:literalc(): Error: 0x00000054, 84, Invalid or incomplete multibyte or wide character, Info:
    char.c:692:literalc(): Error: 0x00000054, 84, Invalid or incomplete multibyte or wide character, Info:
    make: *** [makefile:36: char.run] Error 1
    Compilation failed.

  5. #5
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    New idea, no using string either, definitely causes an error with cc (presumably redirecting to gcc) as I've just tested it:
    Code:
    0~62~z
    Last edited by awsdert; 10-16-2019 at 05:57 PM. Reason: Forgot to remove brackets

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. using variables in literal strings
    By cfanatic in forum C Programming
    Replies: 2
    Last Post: 11-28-2012, 05:16 AM
  2. Modifying a string literal
    By Richardcavell in forum C Programming
    Replies: 3
    Last Post: 02-15-2011, 12:26 AM
  3. Literal UL
    By shani in forum C Programming
    Replies: 4
    Last Post: 02-09-2008, 03:53 PM
  4. use of hex literal constant
    By happycoder in forum C++ Programming
    Replies: 6
    Last Post: 06-07-2003, 11:45 PM
  5. String literal
    By subdene in forum C++ Programming
    Replies: 5
    Last Post: 11-05-2002, 02:10 PM

Tags for this Thread