Thread: Character Sets

  1. #1
    Registered User
    Join Date
    Jun 2008

    Character Sets

    What part of a computer/application handles character sets?

  2. #2
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    United States
    uh, charmap?

  3. #3
    Registered User
    Join Date
    Jun 2008
    No, I mean like where the character encodings are held in your computer. Like when you call printf with something like "(char)101", how does the terminal know to print the letter 'A'? Where is it decided that the number 108 will be used to store the letter 'A'?

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Farncombe, Surrey, England
    In a character map. In the really ancient days (and still duiring startup of the BIOS) this was a ROM-image that contains a bitmap of the 256 possible characters, and hardware would copy the bits onto the screen in rows of pixels from each character.

    In a modern system, we have a font rasterizer that either draws a bitmap from a template bitmap, or draws it using a vectorized form (so basicly, a Z would be drawn as 0,0 -> 1, 0; 1, 0 -> 0, 1; 0, 1 -> 1,1 - if we assume a capital letter is in a basic bounding box of 1.0, 1.0 sides (some letters DO stick out of the basic bounding box, such as 'g' or '').

    It gets much more interesting when to deal with script languages, such as Arabic, where some of the letters will connect across many other letters [or at least, so I understand, from talking to some of my colleagues that deal with font matters where I work].

    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Registered User
    Join Date
    Jun 2008
    So the characters come from fonts(same as charmaps?) which are pictures of letters based off of standards such as ASCII and Unicode?

  6. #6
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Yes, ultimately the fonts are maps from integer value to the glyph actually drawn.

    In reality, it's a lot more complicated. The integers may be remapped before being used in drawing, for example (the WinAPI will remap every ANSI string to UTF-16). Several codepoints (integers) may be combined, and looked up from the font as a single entity. This happens for ligatures (the sequence "fi", for example, is often drawn combined), for some scripts like Arabic, or sometimes for combined characters: Unicode allows representing letters like Ä in two forms: either as a single Ä codepoint, or as an A followed by a combining diacrisis (or whatever it's called). They may be looked up separately in the font, or they could be combined and looked up then.
    All the buzzt!

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Using a character array in a switch question.
    By bajanElf in forum C Programming
    Replies: 10
    Last Post: 11-08-2008, 07:06 AM
  2. Replies: 11
    Last Post: 10-07-2008, 06:19 PM
  3. Errors including <windows.h>
    By jw232 in forum Windows Programming
    Replies: 4
    Last Post: 07-29-2008, 01:29 PM
  4. sequential file program
    By needhelpbad in forum C Programming
    Replies: 80
    Last Post: 06-08-2008, 01:04 PM
  5. <string> to LPCSTR? Also, character encoding: UNICODE vs ?
    By Kurisu33 in forum C++ Programming
    Replies: 7
    Last Post: 10-09-2006, 12:48 AM