Thread: Is this safe?

  1. #1
    Registered User
    Join Date
    Jan 2005
    Posts
    204

    Is this safe?

    Code:
    char c = 'A';
    c = tolower(c);
    Is there a chance that something bad might happen here? Thanks.

  2. #2
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    tolower and the other character functions expect an int which is representable as an unsigned char and you are passing a char, which may be signed, depending on your compiler and options. Therefore, it is a good idea to use a cast with these functions:
    Code:
    c = tolower( (unsigned char) c);
    Other than that, your code is fine. It is acceptable to use a variable twice in a statement. What is not acceptable, is to modify it twice. This would produce undefined behaviour:
    Code:
    c = tolower( (unsigned char) c++);

  3. #3
    Registered User mitakeet's Avatar
    Join Date
    Jun 2005
    Location
    Maryland, USA
    Posts
    212
    Actually, the second should be very defined. c is incremented AFTER it is passed to tolower, BEFORE it is assigned the value returned from tolower.

    Free code: http://sol-biotech.com/code/.

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw

  4. #4
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Quote Originally Posted by mitakeet
    Actually, the second should be very defined. c is incremented AFTER it is passed to tolower, BEFORE it is assigned the value returned from tolower.
    Yep, my mistake, there is a sequence point before the function call. I was thinking of the assigment, which is not a sequence point.
    C89 Draft

    The following are the sequence points described in 2.1.2.3

    * The call to a function, after the arguments have been evaluated (3.3.2.2).

    * The end of the first operand of the following operators: logical AND && (3.3.13); logical OR || (3.3.14); conditional ? (3.3.15); comma , (3.3.17).

    * The end of a full expression: an initializer (3.5.7); the expression in an expression statement (3.6.3); the controlling expression of a selection statement ( if or switch ) (3.6.4); the controlling expression of a while or do statement (3.6.5); the three expressions of a for statement (3.6.5.3); the expression in a return statement (3.6.6.4).
    Take 2. This is bad:
    Code:
    c = tolower( (unsigned char) c) + c++;

  5. #5
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    Or perhaps just:
    Code:
    c = c++;
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  6. #6
    Registered User
    Join Date
    Jun 2004
    Posts
    722
    Quote Originally Posted by pianorain
    Or perhaps just:
    Code:
    c = c++;
    that should however be the same as c=c;c++;

  7. #7
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >that should however be the same as c=c;c++;
    But it isn't. The expression is undefined because it modifies c more than once between sequence points.
    My best code is written with the delete key.

  8. #8
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    The cast is useless. All values of an unsigned character will fit in an integer. You can pass it a negative value, it just doesn't have to do anything with it.


    Quzah.
    Hope is the first step on the road to disappointment.

  9. #9
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Quote Originally Posted by Quzah
    The cast is useless. All values of an unsigned character will fit in an integer. You can pass it a negative value, it just doesn't have to do anything with it.
    Using a cast is strongly recommended*. If you pass a negative value (and not EOF) to one of the ctype functions, the behaviour is undefined:
    Quote Originally Posted by C Standard
    The header <ctype.h> declares several functions useful for testing and mapping characters. In all cases the argument is an int , the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF . If the argument has any other value, the behavior is undefined.
    This is not a theoretical issue. Some modern C libraries will crash or give the wrong result without the cast.

    *Although, in this specific case 'A' is guaranteed to be positive, so it's not needed.

    http://cboard.cprogramming.com/showthread.php?p=427882
    http://groups-beta.google.com/group/...a8ff53f0?hl=en
    http://www.greenend.org.uk/rjk/2001/02/cfu.html
    http://www.stanford.edu/~blp/writing...type-cast.html
    http://groups-beta.google.com/group/...94bd62676bcf91
    http://groups-beta.google.com/group/...b4b76ac886b3ff
    http://www.cs.mu.oz.au/research/merc...9807/0309.html

  10. #10
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    Quote Originally Posted by Prelude
    The expression is undefined because it modifies c more than once between sequence points.
    [offtopic]
    Why don't schools teach students about sequence points? I've never heard it mentioned in any CS syllabus. It wasn't until certain individuals here at CBoard kept banging sequence points into my head that I finally got it. It seems pretty important to know, since a lack of knowledge can lead to creating code that results in undefined expressions.
    [/offtopic]
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  11. #11
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >Why don't schools teach students about sequence points?
    Because sequence points are "advanced trivia" and only required knowledge if you want to understand the language. Schools only want you to memorize their bad habits, not understand enough to correct them in public. Have you ever seen how we jump on people claiming to be teachers? That's what schools try to avoid in the classroom.
    My best code is written with the delete key.

  12. #12
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by anonytmouse
    Using a cast is strongly recommended*. If you pass a negative value (and not EOF) to one of the ctype functions, the behaviour is undefined
    It's still pointless. Why mung up your data just so it fits what the function handles? Pass it the right value in the first place, and don't waste your time on pointless casts.

    [edit]
    Here, let me further the point of why your cast is not only pointless, it's wrong. Read me. Now do you see why your cast is not only pointless, it's wrong to do so? If not, read that again.

    However, if you insist, here is the "correct wrong way" to do it:
    Code:
    toupper( (int)((unsigned char) x) );
    Why? Because the function doesn't want an unsigned char. It wants an integer. So if you're going to be "correct" in your wrongness, it's only fitting that you typecast it back to an int. After all, that's what the function wants.

    Don't you just hate being wrong?
    [/edit]


    Quzah.
    Last edited by quzah; 06-28-2005 at 09:33 PM.
    Hope is the first step on the road to disappointment.

  13. #13
    Geek. Cobras2's Avatar
    Join Date
    Mar 2002
    Location
    near Westlock, and hour north of Edmonton, Alberta, Canada
    Posts
    113
    Quote Originally Posted by pianorain
    [offtopic]
    Why don't schools teach students about sequence points? I've never heard it mentioned in any CS syllabus. It wasn't until certain individuals here at CBoard kept banging sequence points into my head that I finally got it. It seems pretty important to know, since a lack of knowledge can lead to creating code that results in undefined expressions.
    [/offtopic]
    Well, Athabasca University still uses void main() in their assignments, as of 2005 :-/

    I was a little surprised to see that was the case, but there you go. I guess the teachers don't know everything after all
    James G. Flewelling
    Rgistered Linux User #327359
    Athabasca University Student (BSc. CIS)

    http://catb.org/~esr/faqs/smart-questions.html
    http://catb.org/jargon/

    http://www.ebb.org/ungeek
    ---GEEK CODE---
    Version: 3.12
    GCS/IT/M d- s+:++ a-->->>+>++>+++>? C++++>$ UL++>++++$ P++>++++ L++>++++$
    E W++ N o? K? w++(--)>--- O? M? V? PS--(---) PE Y+ PGP? t 5? !X R(*)>++
    tv-->! b++(+++)>++++ DI? D+++(---)>++++$ G e*>++$ h++>*$ r!>+++ y?
    ----/GEEK CODE----
    upd: 2005-02-11

  14. #14
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Quote Originally Posted by Quzah
    It's still pointless. Why mung up your data just so it fits what the function handles? Pass it the right value in the first place, and don't waste your time on pointless casts.
    Here are some implementations of ctype functions from the compilers I have installed on my machine.
    Code:
    extern	unsigned char	_ctype[];
    
    #define	isalpha(c)	((_ctype+1)[c]&(_UPPER|_LOWER))
    #define	isupper(c)	((_ctype+1)[c]&_UPPER)
    #define	islower(c)	((_ctype+1)[c]&_LOWER)
    #define	isdigit(c)	((_ctype+1)[c]&_DIGIT)
    Code:
    extern const unsigned short *_pctype;
    
    #define __chvalidchk(a,b)       (_pctype[a] & (b))
    
    #define isalpha(_c)     (MB_CUR_MAX > 1 ? _isctype(_c,_ALPHA) : __chvalidchk(_c, _ALPHA))
    #define isupper(_c)     (MB_CUR_MAX > 1 ? _isctype(_c,_UPPER) : __chvalidchk(_c, _UPPER))
    #define islower(_c)     (MB_CUR_MAX > 1 ? _isctype(_c,_LOWER) : __chvalidchk(_c, _LOWER))
    Code:
    /* According to standard for SB chars, these function are defined only
     * for input values representable by unsigned char or EOF.
     * Thus, there is no range test.
     */
    unsigned short* _pctype;
    #define __ISCTYPE(c, mask)  (MB_CUR_MAX == 1 ? (_pctype[c] & mask) : _isctype(c, mask))
    
    extern __inline__ int isalnum(int c) {return __ISCTYPE(c, (_ALPHA|_DIGIT));}
    extern __inline__ int isalpha(int c) {return __ISCTYPE(c, _ALPHA);}
    extern __inline__ int iscntrl(int c) {return __ISCTYPE(c, _CONTROL);}
    Do you see why the cast is needed yet? Negative character values are valid in many locales.

    Good thread on the issue.
    Previous debate on the issue.
    Apache wrapper macros.

    wGet wrapper macros

    /* OK, now define a decent interface to ctype macros. The regular
    ones misfire when you feed them chars >= 127, as they understand
    them as "negative", which results in out-of-bound access at
    table-lookup, yielding random results. This is, of course, totally
    bogus. One way to "solve" this is to use `unsigned char'
    everywhere, but it is nearly impossible to do that cleanly, because
    all of the library functions and system calls accept `char'.

    Thus we define our wrapper macros which simply cast the argument to
    unsigned char before passing it to the <ctype.h> macro. These
    versions are used consistently across the code. */
    #define ISASCII(x) isascii ((unsigned char)(x))
    #define ISALPHA(x) isalpha ((unsigned char)(x))
    #define ISSPACE(x) isspace ((unsigned char)(x))
    #define ISDIGIT(x) isdigit ((unsigned char)(x))
    #define ISXDIGIT(x) isxdigit ((unsigned char)(x))
    Last edited by anonytmouse; 06-29-2005 at 11:43 AM.

  15. #15
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Do you see why the cast is wrong? No? Go read the EOF FAQ again then.


    Quzah.
    Hope is the first step on the road to disappointment.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. type safe issue
    By George2 in forum C++ Programming
    Replies: 4
    Last Post: 02-12-2008, 09:32 PM
  2. Bjarne's exception safe sample
    By George2 in forum C++ Programming
    Replies: 13
    Last Post: 12-28-2007, 05:38 PM
  3. A Safe Dialect of C
    By viaxd in forum Tech Board
    Replies: 11
    Last Post: 11-26-2003, 11:14 AM
  4. How safe is it?
    By hermit in forum A Brief History of Cprogramming.com
    Replies: 40
    Last Post: 05-08-2002, 09:33 PM
  5. Safe Mode on FreeBsd
    By Unregistered in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 10-25-2001, 09:37 AM