Thread: Is this safe?

  1. #16
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    I think this is the point tmouse is trying to make.
    Code:
    #include <stdio.h>
    #include <ctype.h>
    
    int myisprint(int value)
    {
       if ( value < 0 )
       {
          puts("Bzzzt!"); /* potentially undefined behavior */
          return 0;
       }
       return isprint(value);
    }
    
    void foo(const char *text)
    {
       puts("foo");
       for ( ; *text; ++text )
       {
          myisprint(*text);
       }
    }
    
    void bar(const char *text)
    {
       puts("bar");
       for ( ; *text; ++text )
       {
          myisprint((unsigned char)*text);
       }
    }
    
    int main(void)
    {
       static const char text[] = "Ich möchte"; /* some text I googled */
       foo(text);
       bar(text);
       return 0;
    }
    
    /* my output
    foo
    Bzzzt!
    bar
    */
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  2. #17
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Quzah, if char is signed by default, characters with a negative numerical value will yield undefined behaviour on Anonytmouse's implementation, even though those values are prefectly valid.
    Quote Originally Posted by Quzah
    Don't you just hate being wrong?
    Please.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  3. #18
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > if char is signed by default
    It isn't - an unqualified char may be signed or unsigned.

    gcc for example has -fsigned-char and -funsigned-char compilation options to fiddle things when code makes the wrong choices.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #19
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by Sang-drax
    Quzah, if char is signed by default, characters with a negative numerical value will yield undefined behaviour on Anonytmouse's implementation, even though those values are prefectly valid.
    Please.
    Newsflash: EOF is negative! Now be a good little programmer and go read the EOF FAQ.

    Since I know it'll be too much work for you to go read yourself, I'll give you another version of the FAQ:

    C: A reference manual, 5th edition. Page 365.
    The value EOF is conventionally used as a value that signals end of file--that is, the
    exhaustion of input data. It has the value of -1 in most traditional implementations, but Stan-
    dard C requires only that it be a negative integral constant expression.
    So what part of negative and integeral are you two not understanding?

    Don't you just hate being wrong?


    Quzah.
    Hope is the first step on the road to disappointment.

  5. #20
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    EOF is an exception -- all other negative numbers passed to is* and to* will invoke undefined behavior.

    [edit]Read the Chris Torek post that tmouse posted earlier.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  6. #21
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    It's not an exception. It's a requirement. It is required by the standard to be supported. Now it's your turn to go read the EOF FAQ. Can no one here read any more? Come on Dave! You know better. Why is this wrong?
    Code:
    char c;
    
    while( (c = getchar()) != EOF )
        ...damn, too bad that's a char...
    It's wrong. You're wrong. They're wrong. Have a nice day.


    Quzah.
    Hope is the first step on the road to disappointment.

  7. #22
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    7.4 Character handling <ctype.h>
    1 The header <ctype.h> declares several functions useful for classifying and mapping characters.166) In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
    So, how many unsigned chars have negative values?
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  8. #23
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    So how many typecasts to a negative value, which just happens to be EOF still accurately represent EOF? Oh, and what is EOF again? That's right, any negative number, as defined by your compiler. Let me say that again, because you apparently can't grasp the concept:

    EOF can be any negative number.

    One more time, all together: "What happens when you typecast EOF to an unsigned char?"


    Quzah.
    Hope is the first step on the road to disappointment.

  9. #24
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by quzah
    Let me say that again, because you apparently can't grasp the concept:

    EOF can be any negative number.
    It can be any one number -- it is not the same as all negative numbers.

    (Really, the Torek piece explains this much better than I do.)
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  10. #25
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Right. So where again in your automatic casting to an unsigned char do you even remotely consider EOF? Oh, that's right, you didn't. That makes it wrong now doesn't it?


    Quzah.
    Hope is the first step on the road to disappointment.

  11. #26
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    The is* and to* functions have defined behavior for EOF.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  12. #27
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    You clearly aren't understanding. How does casting EOF to an unsigned char give you the correct result from any is* or to* function? It doesn't. Therefore, casting to an unsigned char is wrong. How can you possibly get an accurate result from an is* or to* function when you've just typecast your data to something else entirely? EOF is handled. Yes. But what happens when you've now butchered EOF so it is no longer EOF?


    Quzah.
    Hope is the first step on the road to disappointment.

  13. #28
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    You're right, I've gotten lost a little here. I don't have to worry about EOF because my data was in a string. As some of the previous links pointed out, if the value comes from fgetc and the like, the cast is undesired. But for the example I posted, the cast prevents UB by mapping a negative value into the range of unsigned char.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  14. #29
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    If your data was in a string, then you don't even have to worry about negative values. So again, the cast is pointless, as I stated earlier. Also as I stated earlier, the call to the function is not the place to be making sure your data is correct. You make sure it's correct and then you pass it, or you don't pass it at all.


    Quzah.
    Hope is the first step on the road to disappointment.

  15. #30
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    http://groups-beta.google.com/group/...a8ff53f0?hl=en
    Code:
    I have no idea when the last discussion was, but yes, something like
    this is a good idea.  It applies to all the <ctype.h> functions.  The
    problem is that their domain -- the set of values they take -- is
    {EOF, [0..UCHAR_MAX]}.  That is, toupper(EOF) is defined; toupper(0)
    is defined; toupper(1), toupper(2), toupper(3), etc., are defined;
    toupper(128) through toupper(255) are all defined; and if UCHAR_MAX
    exceeds 255, additional toupper()s are defined.
    
    On the other hand, unless EOF is -40, toupper(-40) is *not* defined.
    If plain "char" is signed, and if "char *p" happens to point to a
    plain "char" that has a value of -40, then:
    
            toupper(*p)
    
    is *not* defined.  (It gives rise to the dreaded "undefined behavior".)
    You can write either of these:
    
            toupper((unsigned char)*p)
            toupper(*(unsigned char *)p)
    
    which have different meanings on one's complement and sign-and-magnitude
    systems.  So, which one you should write depends on what is in the
    memory to which "p" points.
    
    All "normal" characters are nonnegative, so toupper('a') is definitely
    'A', whether 'a' is 0x61 (ASCII) or 0x81 (EBCDIC) or something else
    entirely.  (This implies that, on 8-bit EBCDIC systems such as IBM
    mainframes, plain "char" must in fact be unsigned.)  So if "p"
    points to normal text, the undefined-behavior aspect of touppper(*p)
    will not rear its ugly head.  Unfortunately, all *that* really means
    is that bugs tend to get past testing, and produce undefined behavior
    when someone in Europe runs ISO-Latin-1 text through the program.
    [edit]http://www.stanford.edu/~blp/writing...type-cast.html
    Of course, if you store the return value of one of these function in a char object, then pass the char, the cast becomes necessary again.
    As in,
    Code:
    #include <stdio.h>
    #include <ctype.h>
    
    int myisprint(int value)
    {
       if ( value < 0 )
       {
          puts("Bzzzt!");
          return 0;
       }
       return isprint(value);
    }
    
    /* embedded nasty character: ö */
    int main(void)
    {
       char buffer[1024];
       int i, j;
       for ( i = 0; i < 1024; ++i )
       {
          int c = getchar();
          if ( c == EOF )
          {
             break;
          }
          buffer[i] = c;
       }
       for ( j = 0; j < i; ++j )
       {
          if ( myisprint(buffer[j]) )
          {
             /* putchar(buffer[j]); */
          }
       }
       return 0;
    }
    
    /* my output
    C:\Test>test < test.c
    Bzzzt!
    */
    Last edited by Dave_Sinkula; 06-29-2005 at 03:41 PM.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. type safe issue
    By George2 in forum C++ Programming
    Replies: 4
    Last Post: 02-12-2008, 09:32 PM
  2. Bjarne's exception safe sample
    By George2 in forum C++ Programming
    Replies: 13
    Last Post: 12-28-2007, 05:38 PM
  3. A Safe Dialect of C
    By viaxd in forum Tech Board
    Replies: 11
    Last Post: 11-26-2003, 11:14 AM
  4. How safe is it?
    By hermit in forum A Brief History of Cprogramming.com
    Replies: 40
    Last Post: 05-08-2002, 09:33 PM
  5. Safe Mode on FreeBsd
    By Unregistered in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 10-25-2001, 09:37 AM