-
The problem is with the sign-extension and is easy to fix.
- Consider the char value -10.
- This is "11110110" in two's-complement binary.
- Now we implicitly convert this to a signed int by passing it to isprint().
- The value is sign-extended and we now have "11111111 11111111 11111111 11110110" in binary.
- The implementation casts this value to an unsigned int for comparison purposes.
- We now have the decimal value 4294967286. Obviously, this will cause problems if we try to compare it against a 256 or 64KB lookup table.
This is demonstrated with this very simple program.
Code:
#include <stdio.h>
void demo(int c)
{
printf("int_value: %d\n", c);
printf("uint_value: %u\n", c);
}
int main(void)
{
char c = -10;
demo(c);
demo((unsigned char) c);
getchar();
return 0;
}
The solution is to avoid the sign-extension by passing the value as an unsigned char.
Code:
isprint((unsigned char) 'ö')
However, it still doesn't return true. This is because the ctype functions use the "C" locale by default. In this locale, only the base characters are supported. We can use the setlocale function or we can use Windows functions which provide much more reliable international character support.
Code:
#include <stdio.h>
#include <ctype.h>
#include <windows.h>
#include <locale.h>
int win_isprint(UCHAR c)
{
WORD char_type;
if (GetStringTypeA(LOCALE_USER_DEFAULT, CT_CTYPE1, (LPCSTR) &c, 1, &char_type))
return !(char_type & C1_CNTRL);
else
return 0;
}
int main(void)
{
char c = -10;
setlocale( LC_ALL, ".ACP" ); /* Uses the windows specified code page. */
printf("CLib: %d\n", isprint((unsigned char) '\b'));
printf("CLib: %d\n", isprint((unsigned char) 'a'));
printf("CLib: %d\n", isprint((unsigned char) 'ö'));
printf("Win: %d\n", win_isprint('ö'));
printf("Win: %d\n", win_isprint('\b'));
getchar();
return 0;
}
Results:
Code:
CLib: 0
CLib: 258
CLib: 258
Win: 1
Win: 0
-
I know all of this. It's still counterintuitive. Especially as the same code works in C.
-
I do not see what the big deal is:
cout<<char(156);
works fine for me in devc++ and vc++
-
Quote:
I what the big deal is:
The bolded part:
Quote:
works fine for me.
-
just curious after reading through this post why couldnt you just go to the character map copy the symbol then paste it in your code like this...
-
Because of this (simplified explanation): the compiler has a target character encoding. The literal strings inside the executable (including your "£\n") will be in this encoding. But the computer knows no such thing as a character in memory, it's all numbers - displaying a character just means using the number as an index into a table of drawing instructions. This is what an encoding does: it specifies which number to use for which character. This number will then get stored in the executable.
There might be another encoding used on the target system. The same number stored in the executable could suddenly be a different character.
This is what is so unreliable about the whole thing.
Windows complicates matters: there is more than one encoding on a single platform. While the compiler will most likely use Windows' native Windows-1252 (or variant, depending on where you live) encoding as target, the interpretation of the characters, when written to the console, will happen using the IBM-OEM-xxx encoding, where xxx is your code page number. (443 is American English, 850 is German, ...)
Try it yourself. Take that code snippet, compile it as a console app and run. It won't be a pound sign printed to the console.
The problem is that you can't detect what character set is being used using standard C++. Your best bet, therefore, is to use wide-character strings. Using this code:
wcout << L"£\n";
should work. (Provided that you have a proper implementation of your standard streams. I'm not sure if there is such a thing.)