Thread: tolower and locale

  1. #16
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> An E with an acute accent, for instance, has more than one codepoint in Unicode
    That's mixing terms a bit. The term "code point" in used to describe a single unit of storage for a particular Unicode encoding. So for UTF8 each byte is code point, for UTF16 each word is a code point etc... It can take multiple code points to represent a single Unicode character.

    >> So those letters would be what -- a single byte outside of the ascii range?
    There are various "code pages" that map an integral value to a character glyph. A single-byte code page would contain a maximum of 256 mappings. For any code page (that I know of) the characters between 0 and 0x7F (127) are always the same. These are you basic "ascii" characters.

    To illustrate, here's a great site that indexes many character glyphs to the values they have under various code pages. Here is "Latin Capital Letter E With Acute" - http://www.tachyonsoft.com/uc0000.htm#U00C9
    As you see, it can the following single-byte values, depending on code page: 0x90, 0xC9, 0x4A, 0x71, or 0xE0.
    Under Unicode (UTF32 or UCS-4) it's U+000000C9.
    Encoded with UTF8 it's the two bytes 0xC3 0x89.

    Even though UTF8 can use multiple bytes to encode a single character, don't confuse it with a multi-byte code page. UTF8 is just an 8bit encoding of Unicode characters. (So you can kinda think of Unicode as a really big, 32bit "code page".)

    Here are examples of multi-byte code pages in Windows: http://www.microsoft.com/globaldev/reference/WinCP.mspx
    They call is "DBCS" for obvious reasons.

    gg

  2. #17
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    hmmm....
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Case insensitive string compare...?
    By cpjust in forum C++ Programming
    Replies: 9
    Last Post: 02-22-2008, 04:44 PM