Thread: wrong/right ASCII?

  1. #1
    and the hat of copycat stevesmithx's Avatar
    Join Date
    Sep 2007
    Posts
    587

    wrong/right ASCII?

    When I execute the following code:
    Code:
    #include <stdio.h>
    
    int main()
      {
      char x='ä';   //please note that this is not alphabet 'a'
      int y=(unsigned char)x;
      printf("%d",y);
      return 0;
      }
    I get the output as 228.
    But the output should be '132' which corresponds to that extended
    ascii character according to http://www.asciitable.com/
    Why do i get this and what am i missing here?
    Thanks in advance.
    Not everything that can be counted counts, and not everything that counts can be counted
    - Albert Einstein.


    No programming language is perfect. There is not even a single best language; there are only languages well suited or perhaps poorly suited for particular purposes.
    - Herbert Mayer

  2. #2
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    132 is correct, yes. Perhaps you are seeing the wonders of signedness! behold!

    Check this out.

    Example 1:
    Code:
    #include <stdio.h>
    
    int main()
      {
      unsigned char x='&#228;';   //please note that this is not alphabet 'a'
      int y=(unsigned char)x;
      printf("&#37;d",y);
      return 0;
      }
    Example 2:
    Code:
    #include <stdio.h>
    
    int main()
      {
      char x='&#228;';   //please note that this is not alphabet 'a'
      int y= 255 + x;
      printf("%d",y);
      return 0;
      }

  3. #3
    and the hat of copycat stevesmithx's Avatar
    Join Date
    Sep 2007
    Posts
    587
    Well,your first example gives the same output 228.
    Second one shows 227 because the value of signed character corresponding to '&#228;' is -28.
    But shouldn't your first one output 132.(This is what i am not getting)
    Thanks for your quick reply.
    Not everything that can be counted counts, and not everything that counts can be counted
    - Albert Einstein.


    No programming language is perfect. There is not even a single best language; there are only languages well suited or perhaps poorly suited for particular purposes.
    - Herbert Mayer

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    According to my character map, ä is U+00E4, which means I typed that character into this little textbox with Alt-0228. Coincidence?

  5. #5
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    I suggest you avoid swedish charters in chars, because they are not very good for that.
    To use international characters, you really should be using unicode or wide chars.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  6. #6
    and the hat of copycat stevesmithx's Avatar
    Join Date
    Sep 2007
    Posts
    587
    According to my character map, &#228; is U+00E4, which means I typed that character into this little textbox with Alt-0228. Coincidence?
    Yeah i get it too.
    The listing on www.asciitable.com is wrongly ordered then?

    I suggest you avoid swedish charters in chars, because they are not very good for that.
    To use international characters, you really should be using unicode or wide chars.
    :-)I was just experimenting with extended ASCII characters.
    Thanks both of you.
    Not everything that can be counted counts, and not everything that counts can be counted
    - Albert Einstein.


    No programming language is perfect. There is not even a single best language; there are only languages well suited or perhaps poorly suited for particular purposes.
    - Herbert Mayer

  7. #7
    Registered User OnionKnight's Avatar
    Join Date
    Jan 2005
    Posts
    555
    What makes you think your system uses extended ASCII? In latin1, '&#228;' is 228. But why are you worried about the code for the character in the first place?

  8. #8
    and the hat of copycat stevesmithx's Avatar
    Join Date
    Sep 2007
    Posts
    587
    Quote Originally Posted by OnionKnight View Post
    What makes you think your system uses extended ASCII? In latin1, 'ä' is 228. But why are you worried about the code for the character in the first place?
    You are right.
    It must be using something else.
    Stupid me,that thought never crossed my feeble mind.
    Thanks OnionKnight.
    Not everything that can be counted counts, and not everything that counts can be counted
    - Albert Einstein.


    No programming language is perfect. There is not even a single best language; there are only languages well suited or perhaps poorly suited for particular purposes.
    - Herbert Mayer

  9. #9
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by stevesmithx View Post
    When I execute the following code:
    Code:
    #include <stdio.h>
    
    int main()
      {
      char x='ä';   //please note that this is not alphabet 'a'
      int y=(unsigned char)x;
      printf("%d",y);
      return 0;
      }
    I get the output as 228.
    But the output should be '132' which corresponds to that extended
    ascii character according to http://www.asciitable.com/
    Why do i get this and what am i missing here?
    Thanks in advance.
    What you're missing is that there is no such ASCII character. The character set you need, by definition, is not "ASCII" but this "Extended ASCII" thing you refer to in your link. There is no guarantee that your terminal is using this character set, or for that matter your text editor either.

    One day I plan to sit down and write up a FAQ about wide characters, character sets, and encodings because this stuff is more complicated than it first appears. As evidenced by questions like "How do I print this extended ASCII character" when there is in fact no such thing as "extended ASCII."
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  10. #10
    and the hat of copycat stevesmithx's Avatar
    Join Date
    Sep 2007
    Posts
    587
    Quote Originally Posted by brewbuck View Post
    What you're missing is that there is no such ASCII character. The character set you need, by definition, is not "ASCII" but this "Extended ASCII" thing you refer to in your link. There is no guarantee that your terminal is using this character set, or for that matter your text editor either.

    One day I plan to sit down and write up a FAQ about wide characters, character sets, and encodings because this stuff is more complicated than it first appears. As evidenced by questions like "How do I print this extended ASCII character" when there is in fact no such thing as "extended ASCII."
    Kindly do it soon,I wanna learn it!.

    Off-topic:
    BTW, it was very cool of you to use Rabin-Miller algorithm for that prime number contest.
    Before that i have never heard of probabilistic algorithms.
    Not everything that can be counted counts, and not everything that counts can be counted
    - Albert Einstein.


    No programming language is perfect. There is not even a single best language; there are only languages well suited or perhaps poorly suited for particular purposes.
    - Herbert Mayer

  11. #11
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Any characters within your source file that are not within the "basic character set" are handled in a implementation defined manner. The basic character set is:
    Code:
    a b c d e f g h i j k l m n o p q r s t u v w x y z
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    0 1 2 3 4 5 6 7 8 9
    _ { } [ ] # ( ) < > &#37; : ; . ? * + - / ˆ & | ˜ ! = , \ " ’
    So to understand what's going on, you first need to know about code pages. A code page defines what glyph you should see for a particular 8-bit character code. The "basic character set" typically has the same glyph-to-code mapping in all code pages - making them nice and portable. If you go outside this basic set of characters, you then need to know how your text editor is saving the file - and make sure that the format is compatible with the "implementation defined" behavior your compiler.

    Here's a nice site that lists several Unicode characters along with any code pages in which the character appears. Here is "Latin Small Letter A With Diaeresis": http://www.tachyonsoft.com/uc0000.htm#U00E4
    As you can see, it has an ASCII value of 0x84 in a handful of code pages, and 0xE4 in a handful of others.

    In Windows, your typical text editor will encode an ASCII text file using the system's "ANSI Code Page" (ACP). For my system, that's 1252, where &#228; == 0xE4 or 228.

    So let's say you've been real careful to ensure that your source code contains the character code that you expect - let's say '\xE4'. This still isn't of very much use because now you have to worry about what code page the console is actually using to display characters.
    Under Windows, there's the additional hassle of knowing what font the console is using. For example, on my machine, if the console is using "raster fonts", then the glyphs are limited to what's in code page 437 (DOS USA) because my installation is localized to the US (making 437 the default "OEM" charset - or the default console output code page). You can change your console's font to use something like Lucida Console, which has several Unicode characters. In this case, the system will take the current console output code page and try to find a matching Unicode glyph.

    So if you set your console to use Lucida Console font, the following code will print &#228; twice:
    Code:
        SetConsoleOutputCP(1252);
        puts("\xE4");
        
        SetConsoleOutputCP(437);
        puts("\x84");
    Next tutorial: What's wchar_t good for?

    gg

  12. #12
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by Codeplug View Post
    Any characters within your source file that are not within the "basic character set" are handled in a implementation defined manner. The basic character set is:
    Code:
    a b c d e f g h i j k l m n o p q r s t u v w x y z
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    0 1 2 3 4 5 6 7 8 9
    _ { } [ ] # ( ) < > % : ; . ? * + - / ˆ & | ˜ ! = , \ " ’
    Aren't you forgetting: space, tab, CR, LF, NUL?
    "I am probably the laziest programmer on the planet, a fact with which anyone who has ever seen my code will agree." - esbo, 11/15/2008

    "the internet is a scary place to be thats why i dont use it much." - billet, 03/17/2010

  13. #13
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Yeah, something like that. C99 says:
    ... the space character, and control characters representing horizontal tab, vertical tab, and form feed.
    gg

  14. #14
    and the hat of copycat stevesmithx's Avatar
    Join Date
    Sep 2007
    Posts
    587
    Wow.That's really a lot of useful information.
    Thank you so much for taking time to write that clear and detailed explanation.
    Thanks again for sharing codeplug.
    Not everything that can be counted counts, and not everything that counts can be counted
    - Albert Einstein.


    No programming language is perfect. There is not even a single best language; there are only languages well suited or perhaps poorly suited for particular purposes.
    - Herbert Mayer

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. ASCII character with ASCII value 0 and 32
    By hitesh_best in forum C Programming
    Replies: 4
    Last Post: 07-24-2007, 09:45 AM
  2. Replies: 11
    Last Post: 03-24-2006, 11:26 AM
  3. Office access in C/C++ NOT VC++!! :)
    By skawky in forum C++ Programming
    Replies: 1
    Last Post: 05-26-2005, 01:43 PM
  4. ascii values for keys
    By acid45 in forum C Programming
    Replies: 2
    Last Post: 05-12-2003, 07:13 AM
  5. Checking ascii values of char input
    By yank in forum C Programming
    Replies: 2
    Last Post: 04-29-2003, 07:49 AM