is this ever possible?

**BEN10** · 06-27-2009

Halo,
I was just going through "C programming gotchas" over the internet. There I found this issue. For the code below:

Code:

#include<stdio.h>
int main(void)
{
	char ch;
	ch=getchar();
	while(ch!=EOF)
	{
		putchar(ch);
		ch=getchar();
	}
}

This is what they've qouted

The loop may never terminate: if char is an unsigned type then EOF will be converted to some positive value. On systems with where char is signed, there is a more subtle bug. Suppose for example that EOF is -1 - then if character 255 is read it will be converted to the value -1 and terminate the input prematurely.

If I take ch to be of unsigned type then it's fine that the loop will never terminate coz ch will always be positive. But what if I keep it signed as shown above then is it ever possible that ch will be 255(for which ch will be interepreted as -1)? I mean except for Cntrl+Z nothing will break the loop ? Then why it's converted to "int ch"? Why not it remain as "char ch"?
Thanks

**quzah** · 06-27-2009

Originally Posted by BEN10

But what if I keep it signed as shown above

Show me where you've declared that it is signed.

Quzah.

**strickyc** · 06-27-2009

Originally Posted by quzah

Show me where you've declared that it is signed.

Quzah.

Isn't it signed by default? Only unsigned if its declared as so.
Eventhough it isn't made explict doesn't mean its not.

edited: Added content
http://gcc.gnu.org/ml/gcc/2007-01/msg00966.html

During the standards process of the original C standard (ANSI C89),
Dennis Ritchie expressed an opinion that in hindsight, making chars
signed was a bad idea, and that logically chars should be unsigned.

After it was made standard I don't think it has ever changed. It would be a problem if some compilers made it signed by default and others unsigned by default.

Also a char is an int. The compiler does the look-up on the table to tell you what character the number is suppose to represent. I'm sure you know that already. I've read several of your post. When you use a char in an evaluational expression you should think of it as a number. If's whiles' for's and so on.

cnt z is only the EOF on PC's its soemthing different on unix I think.
while(ch!=EOF)
just says as long as I don't press the end of file terminator to continue.

I guess to be safe make it explict and just say it's signed.

**Sebastiani** · 06-27-2009

>> Isn't it signed by default? Only unsigned if its declared as so.

AFAIK, no. It's implementation defined.

**strickyc** · 06-27-2009

Originally Posted by Sebastiani

>> Isn't it signed by default? Only unsigned if its declared as so.

AFAIK, no. It's implementation defined.

I'm pretty sure it's signed by default as a standard. Any compiler that doesn't have it like this doesn't adhere to the standards. Also I think it's an option you can change on some compilers.

**quzah** · 06-27-2009

Originally Posted by strickyc

I'm pretty sure it's signed by default as a standard.

Well you're wrong. The standard doesn't say it has to be signed. It can be either one. That's the entire reason why the program is wrong in the first post.

Quzah.

**quzah** · 06-27-2009

Originally Posted by strickyc

It has to be one or the other. it cant be both and it cant be neither. the standard states it should be signed. Thats what makes it a standard. I understand not everything is up to all the standards.

No it doesn't.

Originally Posted by The C Standard

The three types char, signed char and unsigned char are collectively called character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

Colored emphasis mine.

Quzah.

**strickyc** · 06-27-2009

Character types in C and C++

Ok you win. Its not a standard.

**BEN10** · 06-27-2009

Originally Posted by quzah

Show me where you've declared that it is signed.

Quzah.

If it is unsigned it's range would have been from 0-255 but when I try to do this

Code:

#include<stdio.h>
int main(void)
{
	char ch=128;
	printf("%d",ch);
}

it prints -128 which means it's range is not from 0-255 and thus it is signed by default. This is true for any number out of the range -128-127.

**cyberfish** · 06-27-2009

The program has bugs whether it's signed or unsigned (and I believe it is implementation defined) -

If it's signed, and the byte 255 (0xFF) is read, it will be sign extended to (0xFFFFFFFF), which, if interpreted as a signed int, would be -1, and the loop will end prematurely.

If it's unsigned, the loop will never terminate. Because when -1 (EOF) is read, it will be truncated to 0xFF, and then zero-extended to 0x000000FF, which is obviously different from 0xFFFFFFFF.

As for whether 255 can be read, of course. Think about input redirection.

Why not just keep it as an int?

**grumpy** · 06-27-2009

Originally Posted by BEN10

If it is unsigned it's range would have been from 0-255 but when I try to do this

Code:

#include<stdio.h>
int main(void)
{
	char ch=128;
	printf("%d",ch);
}

it prints -128 which means it's range is not from 0-255 and thus it is signed by default. This is true for any number out of the range -128-127.

..... for your compiler.

Other compilers are allowed to do different things to what you have observed. And, in practice, they do. That is, roughly, the meaning of "implementation defined" in the standard.

The original program has bugs because getchar() returns an int. If the return value is outside the value that can be represented in a char, then a conversion occurs and that conversion will involve a change of the value stored in ch.

**BEN10** · 06-27-2009

Originally Posted by cyberfish

The program has bugs whether it's signed or unsigned (and I believe it is implementation defined) -

If it's signed, and the byte 255 (0xFF) is read, it will be sign extended to (0xFFFFFFFF), which, if interpreted as a signed int, would be -1, and the loop will end prematurely.

If it's unsigned, the loop will never terminate. Because when -1 (EOF) is read, it will be truncated to 0xFF, and then zero-extended to 0x000000FF, which is obviously different from 0xFFFFFFFF.

As for whether 255 can be read, of course. Think about input redirection.

Why not just keep it as an int?

Can you show me just an example where 255 is read? I dont know what is input redirection.
Btw there's no problem keeping it as an int but I just wanted to know why they do that if it can be done with a char too.

**cyberfish** · 06-27-2009

Assuming a.txt contains a single byte - 255.

Code:

yourprogram.exe <  a.txt

To your program, it would seem like 255 was entered in the console.

Or if your program is taking input from another program (less common in Windows, but very common in UNIX) -

Code:

anotherprogram.exe | yourprogram.exe

And of course, anotherprogram can output any byte it wants.

[edit]
Also, it might be possible that 255 can be part of a UTF-8 entity (not sure about this, I don't know too much about UTF-8).

Or maybe the user doesn't use ASCII (some mainframes don't, and many non-English languages don't), and 255 represents something common.
[/edit]

**robwhit** · 06-27-2009

> Also a char is an int.

char is an integer. integer is a classification. int is another type that is in the classification of integers.

>The compiler does the look-up on the table to tell you what character the number is suppose to represent.

Unless the compiler is displaying the graphical representation of a character, it probably doesn't need to look up the encoding in a table. Even if it printed the character to the screen, it probably lets another code library or hardware to the lookup for it.

**cpjust** · 06-27-2009

Since getchar() returns an int, you should get a warning telling you about 'possible loss of data' when assigning an int to a char. Assuming of course that you enable a high compiler warning level, which you always should.

Thread: is this ever possible?

Thread Tools

Search Thread

Display

is this ever possible?