Thread: How is tolower() implemented if..

  1. #1
    Registered User
    Join Date
    Sep 2018
    Posts
    217

    How is tolower() implemented if..

    How is tolower() implemented by the standard library if alphabets are not guaranteed to be continuous?

  2. #2
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    Here's one implementation from newlib/libc.

    Code:
    int
    _DEFUN(tolower,(c),int c)
    {
    #if defined (_MB_EXTENDED_CHARSETS_ISO) || defined (_MB_EXTENDED_CHARSETS_WINDOWS)
     if ((unsigned char) c <= 0x7f)
       return isupper (c) ? c - 'A' + 'a' : c;
     else if (c != EOF && MB_CUR_MAX == 1 && isupper (c))
       {
         char s[MB_LEN_MAX] = { c, '\0' };
         wchar_t wc;
         if (mbtowc (&wc, s, 1) >= 0
          && wctomb (s, (wchar_t) towlower ((wint_t) wc)) == 1)
        c = (unsigned char) s[0];
       }
     return c;
    #else
     return isupper(c) ? (c) - 'A' + 'a' : c;
    #endif
    }

  3. #3
    Registered User
    Join Date
    Sep 2018
    Posts
    217
    Wow that is complicated.. thanks

  4. #4
    Registered User
    Join Date
    May 2012
    Posts
    505
    Quote Originally Posted by Nwb View Post
    How is tolower() implemented by the standard library if alphabets are not guaranteed to be continuous?
    You can implement it like this.

    Code:
    int tolower(int ch)
    {
       const char *upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
       const char *lower ="abcdefghijklmnopqrstuvwxyz";
    
       char *ptr = strchr(upper, ch);
       if( ptr)
         return lower[ptr - upper];
       else
          return ch;
    }
    However usually the implementation doesn't have to be portable, so you can take advantage of the specific encoding.
    I'm the author of MiniBasic: How to write a script interpreter and Basic Algorithms
    Visit my website for lots of associated C programming resources.
    https://github.com/MalcolmMcLean


  5. #5
    Registered User
    Join Date
    Dec 2017
    Posts
    1,634
    As Malcolm said, an implementation doesn't need to be portable, so it can take advantage of the particular character set.

    The newlib implementation isn't as complicated as it seems. If the #else branch is compiled it becomes:
    Code:
    int tolower (int c)
    {
      return isupper(c) ? c - 'A' + 'a' : c;
    }
    isupper, in the newlib library, is:
    Code:
    int isupper (int c)
    {
        return ((__CTYPE_PTR[c+1] & (_U|_L)) == _U);
    }
    which uses a table, a common approach for character typing.
    Presumably it handles EOF as -1, hence the c + 1.

    Part of the table (there's a lot more detail than this, see /newlib/libc/ctype/ctype_.c ) :
    Code:
    #define _U    01     // upper
    #define _L    02     // lower
    #define _N    04     // numeric
    #define _S    010    // whitespace
    #define _P    020    // punctuation
    #define _C    040    // control
    #define _X    0100   // hex
    #define _B    0200   // blank
     
    #define _CTYPE_DATA_0_127 \
        _C,    _C,    _C,    _C,    _C,    _C,    _C,    _C, \
        _C,    _C|_S, _C|_S, _C|_S, _C|_S, _C|_S, _C,    _C, \
        _C,    _C,    _C,    _C,    _C,    _C,    _C,    _C, \
        _C,    _C,    _C,    _C,    _C,    _C,    _C,    _C, \
        _S|_B, _P,    _P,    _P,    _P,    _P,    _P,    _P, \
        _P,    _P,    _P,    _P,    _P,    _P,    _P,    _P, \
        _N,    _N,    _N,    _N,    _N,    _N,    _N,    _N, \
        _N,    _N,    _P,    _P,    _P,    _P,    _P,    _P, \
        _P,    _U|_X, _U|_X, _U|_X, _U|_X, _U|_X, _U|_X, _U, \
        _U,    _U,    _U,    _U,    _U,    _U,    _U,    _U, \
        _U,    _U,    _U,    _U,    _U,    _U,    _U,    _U, \
        _U,    _U,    _U,    _P,    _P,    _P,    _P,    _P, \
        _P,    _L|_X, _L|_X, _L|_X, _L|_X, _L|_X, _L|_X, _L, \
        _L,    _L,    _L,    _L,    _L,    _L,    _L,    _L, \
        _L,    _L,    _L,    _L,    _L,    _L,    _L,    _L, \
        _L,    _L,    _L,    _P,    _P,    _P,    _P,    _C
    Last edited by john.c; 10-09-2020 at 05:15 PM.
    A little inaccuracy saves tons of explanation. - H.H. Munro

  6. #6
    Registered User
    Join Date
    Sep 2018
    Posts
    217
    Thanks a lot Malcolm McLean and john.c! So it's better to use the standard library whenever possible huh?
    my 200th post. yey.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. tolower()
    By Moon River in forum C Programming
    Replies: 5
    Last Post: 10-07-2014, 06:37 PM
  2. tolower()
    By xniinja in forum C Programming
    Replies: 2
    Last Post: 12-01-2010, 09:57 PM
  3. tolower()
    By spikestar in forum C Programming
    Replies: 1
    Last Post: 01-11-2010, 06:32 AM
  4. how do you use tolower()
    By panfilero in forum C Programming
    Replies: 3
    Last Post: 11-03-2005, 01:18 PM
  5. tolower
    By Nicholas35 in forum C++ Programming
    Replies: 5
    Last Post: 02-07-2002, 04:28 PM

Tags for this Thread