Thread: char detail detection

  1. #1
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657

    char detail detection

    Making some kernel safe code and was struggling with how to check limits safely when GCC's documented predefines are unavailable, this is what I've come up with for CHAR* macros when no equivalent macros can be detected:
    Code:
    ...
    #define __UCHAR_MAX__ (0U | -('\x1'))
    ...
    #define __SCHAR_MAX__ (0 | (__UCHAR_MAX__ >> 1))
    ...
    #define __SCHAR_MIN__ ((-__SCHAR_MAX__)-1)
    ...
    /* Bitwise is a workaround for integer promotion */
    #define __CHAR_SIGNED__ (__SCHAR_MAX__ > ('\0' | __UCHAR_MAX__))
    ...
    #elif __CHAR_SIGNED__
    #define __CHAR_UNSIGNED__ 0
    #else
    #define __CHAR_UNSIGNED__ 1
    ...
    Just wanted to know if anyone sees a problem with these?
    Still working on figuring out __CHAR_BIT__ should these be fine

  2. #2
    Registered User
    Join Date
    Dec 2017
    Posts
    721
    I don't understand what you expect bitwise-oring something with 0 to do? It shouldn't do anything at all. And remember that character constants (like '\x1') are ints in C.

    It looks like pretty much all your macros are making ints (or unsigned ints).

    I'm not sure if there's a way to do this. In glibc limits.h they just hardcode the specific values.

    The GNU C Library
    (look under the include directory for limits.h)
    The world hangs on a thin thread, and that is the psyche of man. - Carl Jung

  3. #3
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657
    Quote Originally Posted by john.c View Post
    I don't understand what you expect bitwise-oring something with 0 to do?
    To maintain signedness in preprocessor

    Quote Originally Posted by john.c View Post
    remember that character constants (like '\x1') are ints in C.
    If that's the case what's the point in the wide character variants like L'\x1'?

    Quote Originally Posted by john.c View Post
    It looks like pretty much all your macros are making ints (or unsigned ints).
    As you will note via the __CHAR_SIGNED__ fallback this signedness is vital for it

    Quote Originally Posted by john.c View Post
    I'm not sure if there's a way to do this. In glibc limits.h they just hardcode the specific values.
    That's an absolute last resort, I'd rather not have to detect the CPU and system more than necessary

  4. #4
    Registered User
    Join Date
    Dec 2017
    Posts
    721
    Read this about character constants in C:
    character constant - cppreference.com

    On my computer your __CHAR_SIGNED__ macro gets the wrong value (unless 0 is supposed to mean signed). Have you tested any of this?

    If it was possible to do what you are trying to do then why don't they do it in glibc?
    The world hangs on a thin thread, and that is the psyche of man. - Carl Jung

  5. #5
    Registered User
    Join Date
    Feb 2019
    Posts
    557
    john.c is right and it is easy to demonstrate:

    - Your UCHAR_MAX symbol could be defined as:

    Code:
    #define __UCHAR_MAX__ ((unsigned char)-1)
    ORing with 0 can be discarded by the optimizer.

    - You can force, by casting, __SCHAR_MAX__ and __SCHAR_MIN__ to by signed (no ORing needed)...

    - I believe you can't implement a macro to get the size of a char in bits, like CHAR_BITS, but you can do programmatically.

    - the sizeof '\1' is always the same as sizeof(int) -- In fact, 6.4.4 10 of ISO 9989:1999 says explicitly: "An integer character constant has type int..."

    Code:
    $  gcc -xc -include stdio.h - <<< 'void main(){printf("%zu\n",sizeof('\1')); }'; ./a.out
    4
    The wchat_t type is usually the same size as int type as well:

    Code:
    $ gcc -xc -include stdio.h -include wchar.h - \
    <<< 'void main(){printf("%zu\n",sizeof(wchar_t)); }'; ./a.out
    4
    But you can make it shorter with GCC option -fshort-wchar (good compilers will allow this as well):

    Code:
    $ gcc -xc -include stdio.h -include wchar.h -fshort-wchar - \
    <<< 'void main(){printf("%zu\n",sizeof(wchar_t)); }'; ./a.out
    2

  6. #6
    Registered User
    Join Date
    Feb 2019
    Posts
    557
    Another example... This is valid (the compiler will issue an warning, but the standard accept this construction):

    Code:
    int x = 'abcd';
    printf("%#x\n", x );
    Will print: 0x61626364

  7. #7
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657
    Quote Originally Posted by john.c View Post
    Read this about character constants in C:
    character constant - cppreference.com

    On my computer your __CHAR_SIGNED__ macro gets the wrong value (unless 0 is supposed to mean signed). Have you tested any of this?

    If it was possible to do what you are trying to do then why don't they do it in glibc?
    I had naively thought that characters would be treated as char (as common sense dictates), I did not realise the C committee was this stupid, were it not for annoyingly hard to implement side of converting code to binary I would be severely tempted to write my own compiler with a new language that is C like but without certain crap like no standard
    predefines or non-guaranteed llong (even if it has to be supported via a long[2] array).

  8. #8
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657
    Just gonna fire an error when __CHAR*__ macros can't be forcefully defined then

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    27,271
    Quote Originally Posted by awsdert View Post
    I had naively thought that characters would be treated as char (as common sense dictates), I did not realise the C committee was this stupid, were it not for annoyingly hard to implement side of converting code to binary I would be severely tempted to write my own compiler with a new language that is C like but without certain crap like no standard
    predefines or non-guaranteed llong (even if it has to be supported via a long[2] array).
    If you ever do this and your language becomes popular, I suspect that you'll find young'uns in a few decades complaining about how stupid you were when you designed your programming language
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    Registered User
    Join Date
    Feb 2019
    Posts
    557
    Why it is "stupid" to store a character in multiple bytes? If you use charsets as UTF-8, for exeample, one single character can be stored in 1 to 6 bytes:

    Code:
    $ man 7 utf8
    If you are dealing with a single byte charset (examples: ASCII and ISO-8859-1), how do you test for EOF on functions as fgetc()?

  11. #11
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657
    Quote Originally Posted by flp1969 View Post
    Why it is "stupid" to store a character in multiple bytes? If you use charsets as UTF-8, for exeample, one single character can be stored in 1 to 6 bytes:

    Code:
    $ man 7 utf8
    If you are dealing with a single byte charset (examples: ASCII and ISO-8859-1), how do you test for EOF on functions as fgetc()?
    Remember those integer promotion rules? Where an integer is only promoted if the integer in question goes beyond the default size? That should've applied to characters too, instead they made a mashup of it by fixing the size of it at something bigger than it should be.

  12. #12
    Registered User
    Join Date
    Feb 2019
    Posts
    557
    It applies to char type too. But you are talking about "promotion", defined as "the act of raising to a higher position or rank". What happens, the other way around, is a demotion... Or, are you saying this should be prohibited?

    Code:
    char c = -1;

  13. #13
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    657
    Just accidentally deleted my own post, the code your mentioning is runtime which is unrelated to the preprocessor, also the demotion stuff is unaffected by a change to default size of character constants

  14. #14
    Registered User
    Join Date
    Feb 2019
    Posts
    557
    I'm showing that an int literal is demoted to a char type by the compiler without any problem in response for your promotion question and "certain crap as language standard"...

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    37,537
    I thought we were talking about integer arithmetic in the pre-processor.

    There is NO promotion in the pre-processor.
    Quote Originally Posted by c99
    The resulting tokens compose the controlling constant expression which is evaluated according to the rules of
    6.6, except that all signed integer types and all unsigned integer types act as if they have
    the same representation as, respectively, the types intmax_t and uintmax_t defined
    in the header <stdint.h>. This includes interpreting character constants, which may
    involve converting escape sequences into execution character set members. Whether the
    numeric value for these character constants matches the value obtained when an identical
    character constant occurs in an expression (other than within a #if or #elif directive)
    is implementation-defined. 132) Also, whether a single-character character constant may
    have a negative value is implementation-defined.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Please help with last detail on this
    By LightYear in forum C Programming
    Replies: 5
    Last Post: 04-09-2010, 10:30 AM
  2. I need detail about this
    By sreeramu in forum C++ Programming
    Replies: 1
    Last Post: 03-26-2008, 03:36 AM
  3. How far in detail should I go?
    By avgprogamerjoe in forum A Brief History of Cprogramming.com
    Replies: 6
    Last Post: 08-22-2007, 02:55 PM
  4. almost done...help with small detail...
    By pancho in forum C Programming
    Replies: 1
    Last Post: 03-19-2002, 06:35 AM

Tags for this Thread