Why one character size but two printed on screen

This is a discussion on Why one character size but two printed on screen within the Linux Programming forums, part of the Platform Specific Boards category; Hello, I would like to know why when I print in screen one byte it shows two characters. Example in ...

  1. #1
    Registered User
    Join Date
    Nov 2005
    Posts
    20

    Why one character size but two printed on screen

    Hello,

    I would like to know why when I print in screen one byte it shows two characters.
    Example in french language printing one byte like "oe" (sticked together, 0x9c ascii code ) gives me two characters on screen.
    Is there function other than strlen() to count exactly how many character will be printed.

    Thank you

  2. #2
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,159
    Can you show us your code?
    If you understand what you're doing, you're not learning anything.

  3. #3
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,892
    The oe doesn't have an ASCII code. ASCII is a 7-bit encoding that only encodes 128 characters, quite a few of them special control characters. It only contains the basic English alphabet, the digits 0 through 9, and a few punctuation and whitespace characters. It does not contain any diacritics or foreign characters at all.

    The code you're referring to could be the code the character has in the old IBM PC codepage (now referred to as OEM in Windows), or it could be the Windows-1252 codepage (standard Windows "ANSI" codepage), or it could be the ISO-8859-1 codepage (very similar, and very common - it's standard on most Linux systems).

    However, what is happening is that your source file is actually in the UTF-8 encoding, where the oe character needs 2 bytes to be encoded. However, the runtime still interprets it as something else (probably ISO-8859-1, as I've just noticed this is the Linux forum) and assumes that each byte is a single character. Thus it writes two characters.

    There's no really good solution. Character encodings are one area of C/C++ that I find truly lacking.
    Last edited by CornedBee; 08-08-2006 at 10:22 AM.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  4. #4
    Registered User
    Join Date
    Nov 2005
    Posts
    20
    I can not give here my code. In fact my string are extracted
    from text from file on disk.
    The text is raw (only \r and \n as new line layout) typed on ms-windows
    french lang. Typeface is courrier new.

    I tried to works with UTF8 (as output only) but I have problem on accent.
    I will post another article concerning fench accent in raw text:
    "How to convert raw text with accent to UTF8"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 11
    Last Post: 10-07-2008, 06:19 PM
  2. Largest screen buffer size?
    By Ash1981 in forum C Programming
    Replies: 2
    Last Post: 01-30-2006, 03:31 AM
  3. Feedback: Functional Specification Wording
    By Ragsdale85 in forum C++ Programming
    Replies: 0
    Last Post: 01-18-2006, 03:56 PM
  4. Tetris Questions
    By KneeGrow in forum Game Programming
    Replies: 19
    Last Post: 10-28-2003, 10:12 PM
  5. Determine the size of a character array...
    By Nutshell in forum C Programming
    Replies: 1
    Last Post: 01-10-2002, 09:22 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21