Thread: tolower and locale

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by tabstop View Post
    The C standard says:
    Okay!!
    Phew -- except how could this apply to multi-byte utf8 chars? Which I imagine is all those little accented letters.
    Last edited by MK27; 02-03-2009 at 03:20 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by MK27 View Post
    Phew -- except how could this apply to multi-byte utf8 chars? Which I imagine is all those little accented letters.
    That's why I mentioned towlower -- wouldn't you be using wide characters for utf8?

  3. #3
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by tabstop View Post
    That's why I mentioned towlower -- wouldn't you be using wide characters for utf8?
    Oh I'm not using them at all. I thought a wide character was one that actually occupied more screen space. That's fine, since the character count will be the same and I imagine "wide character" alphabets (like ideograms) do not really use upper and lower case. Altho that begs the question: what is towlower for?

    But the romance languages, etc contain a lot of "modified" ascii characters (an e with an accent, etc) which I presume, since they are not part of ASCII, must be UTF-8, and can be capitalized (an E with an accent).
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #4
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by tabstop View Post
    wouldn't you be using wide characters for utf8?
    They wouldn't be UTF-8 if they were wide chars, they's be UTF-16 or UTF-32...
    "I am probably the laziest programmer on the planet, a fact with which anyone who has ever seen my code will agree." - esbo, 11/15/2008

    "the internet is a scary place to be thats why i dont use it much." - billet, 03/17/2010

  5. #5
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by cpjust View Post
    They wouldn't be UTF-8 if they were wide chars, they's be UTF-16 or UTF-32...
    Fair enough then (apparently character handling needs to be my next project). For whatever reason, I was thinking the shift status (or whatever it's called) would be used for this sort of thing. Maybe not.

  6. #6
    Registered User
    Join Date
    Dec 2008
    Location
    Black River
    Posts
    128
    Quote Originally Posted by MK27 View Post
    how could this apply to multi-byte utf8 chars? Which I imagine is all those little accented letters.
    I believe the parameter for tolower / toupper must be in the range [0 .. UCHAR_MAX), which would make it useless for multi-byte characters.

    Quote Originally Posted by tabstop
    wouldn't you be using wide characters for utf8?
    The standard says nothing about the encoding of wide characters. However, on most Unix platforms, wchar_t represents UTF-32 code points (a property you can check by veryfing the existance of the __STDC_ISO_10646__ macro), whereas on windows, it usually represents UCS-2 or UTF-16. So one could use towlower / towupper and then convert back to UTF-8 according to the host platform.
    Last edited by Ronix; 02-03-2009 at 03:35 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Case insensitive string compare...?
    By cpjust in forum C++ Programming
    Replies: 9
    Last Post: 02-22-2008, 04:44 PM