Comparing characters

This is a discussion on Comparing characters within the C++ Programming forums, part of the General Programming Boards category; Originally Posted by cpjust Actually, wouldn't it be more like this: Code: #ifdef UNICODE #define _T( x ) L ## ...

  1. #16
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Posts
    22,920
    Quote Originally Posted by cpjust View Post
    Actually, wouldn't it be more like this:
    Code:
    #ifdef UNICODE
    #define _T( x )   L ## x
    #else
    #define _T( x )   x
    #endif
    Oops! Yes, you're right.

    Quote Originally Posted by ZuK View Post
    Both char and wchar_t are integeral types. There is never a cast needed to compare them. At best you will get a warning if the comparison might not work as expected (e.g. signed <> unsigned ).
    Kurt
    Which means no cast will be required at all. You can simply compare to a character without trouble, so there's no need for any fancy work here for that function.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  2. #17
    Registered User
    Join Date
    Jan 2008
    Posts
    58
    Thanks.

  3. #18
    Registered User
    Join Date
    Jan 2005
    Posts
    7,344
    >> Both ifs evaluate as true.

    But that code is not relevant to the question. The question is whether the compare will work as intended in all cases. Can you say for sure that there isn't any other wchar_t value that when compared against the char '0' will be evaluate to true? That's the question.

  4. #19
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Posts
    22,920
    I don't think so.
    The compiler treats the wchar_t as a word so it will compare a word against '0'.
    If it compared only the high or low byte of the value, then we could get false positives, but from looking at the assembly, it doesn't appear to be so.
    But I can do a test.

    UPDATE:
    Code:
    	wchar_t w = 0;
    	for (int i = 0; i <= 0xFFFF; i++, w++)
    	{
    		if (w == '0') cout << "YES! w == '0'!\n";
    	}
    Generates one output saying the if is true. So it's safe.
    Last edited by Elysia; 01-13-2008 at 05:59 AM.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  5. #20
    ZuK
    ZuK is offline
    Registered User
    Join Date
    Aug 2005
    Location
    Austria
    Posts
    1,990
    Quote Originally Posted by Elysia View Post
    I don't think so.
    The compiler treats the wchar_t as a word so it will compare a word against '0'.
    If it compared only the high or low byte of the value, then we could get false positives, but from looking at the assembly, it doesn't appear to be so.
    But I can do a test.
    This is not the question.
    The question is. Has the multibyte char L'0' the same value as the char '0' ?
    Kurt

  6. #21
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Posts
    22,920
    I thought I just busted that. There is only one value in wchar_t's range (word) that is equal to '0' and that is 0x30, the same as L'0' which is 0x0030.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  7. #22
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,263
    Assuming that the wide char encoding and the non-wide char encoding are equivalent enough to directly compare character values is probably unsafe.

  8. #23
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,263
    Quote Originally Posted by Elysia View Post
    I thought I just busted that. There is only one value in wchar_t's range (word) that is equal to '0' and that is 0x30, the same as L'0' which is 0x0030.
    "Wide" doesn't have to mean "Unicode." This is an unsafe assumption.

  9. #24
    Registered User
    Join Date
    Jan 2005
    Posts
    7,344
    >> The question is. Has the multibyte char L'0' the same value as the char '0' ?
    That is not my question. My question is whether there are any other wchar_t values that when compared to '0' will return true. Will the wchar_t be converted to char and lose some of its information and become '0'.

  10. #25
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,308
    Just don't try comparing chars to wchar_ts. Create a traits class with explicit specialisation for char and wchar_t
    Then do:
    Code:
    if (w == specialCharTrait<T>zero())
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #26
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Posts
    22,920
    So for a new test I tried,
    Code:
    	wchar_t w = 0;
    	for (int i = 0; i <= 0xFFFF; i++, w++)
    	{
    		if (w == '0' && w == L'0') cout << "YES! w == '0' && w == L'0'!\n";
    		else if (w == '0') cout << "YES! w == '0'!\n";
    		else if (w == L'0') cout << "AND YES! w == L'0'!\n";
    	}
    	unsigned char c = 0;
    	for (int i = 0; i < 0xFF; i++, c++)
    	{
    		if (c == '0' && w == L'0') cout << "YES! w == '0' && w == L'0'!\n";
    		else if (c == '0') cout << "YES! w == '0'!\n";
    		else if (c == L'0') cout << "AND YES! w == L'0'!\n";
    	}
    And it outputs
    YES! w == '0' && w == L'0'!
    YES! w == '0'!

    And you don't need to worry about the wchar_t getting implicitly cast to a char. It's not, according to the assembly:

    Code:
    00401DB0  cmp         di,30h 
    00401DB4  jne         wmain+8Ah (401DCAh) 
    00401DB6  mov         ecx,dword ptr [__imp_std::cout (403050h)] 
    00401DBC  push        4040C0h 
    00401DC1  push        ecx  
    00401DC2  call        std::operator<<<std::char_traits<char> > (401000h) 
    00401DC7  add         esp,8 
    00401DCA  inc         edi  
    00401DCB  sub         ebx,1 
    00401DCE  jne         wmain+70h (401DB0h) 
    So basically, it stores the data in edi and uses the low-order word di to compare.
    The char loop is even easier:

    Code:
    00401DD7  cmp         bl,30h 
    00401DDA  jne         wmain+0C3h (401E03h)
    00401DDC  cmp         di,30h 
    00401DE0  jne         wmain+0B0h (401DF0h) 
    00401DE2  mov         edx,dword ptr [__imp_std::cout (403050h)] 
    00401DE8  push        404108h 
    00401DED  push        edx  
    00401DEE  jmp         wmain+0BBh (401DFBh) 
    		else if (c == '0') cout << "YES! w == '0'!\n";
    00401DF0  mov         eax,dword ptr [__imp_std::cout (403050h)] 
    00401DF5  push        404128h 
    00401DFA  push        eax  
    00401DFB  call        std::operator<<<std::char_traits<char> > (401000h) 
    00401E00  add         esp,8 
    00401E03  inc         bl   
    00401E05  sub         ebp,1 
    00401E08  jne         wmain+97h (401DD7h) 
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  12. #27
    Registered User
    Join Date
    Jan 2005
    Posts
    7,344
    >> And you don't need to worry about the wchar_t getting implicitly cast to a char. It's not, according to the assembly

    How does the assembly output of one compiler on a specific platform answer the general question? The whole point is whether you can guarantee it will be safe and there won't be any false positives, so the fact that it seems to work in one instance doesn't help much.

    >> So basically, it stores the data in edi and uses the low-order word di to compare.

    Isn't that the same as converting to char? What if '0' has different data in the high-order word than the wchar_t character you are comparing? They would compare as equal when they are not.

  13. #28
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Posts
    22,920
    Quote Originally Posted by Daved View Post
    >> And you don't need to worry about the wchar_t getting implicitly cast to a char. It's not, according to the assembly

    How does the assembly output of one compiler on a specific platform answer the general question? The whole point is whether you can guarantee it will be safe and there won't be any false positives, so the fact that it seems to work in one instance doesn't help much.
    You're free to do your own tests, of course.
    My tests show it's safe on Visual Studio 2008. More than that I cannot gaurantee unless it's mentioned in the standard.

    >> So basically, it stores the data in edi and uses the low-order word di to compare.

    Isn't that the same as converting to char? What if '0' has different data in the high-order word than the wchar_t character you are comparing? They would compare as equal when they are not.
    wchar_t is 2 bytes (so a word) on Visual Studio, so the assembly is perfectly fine.
    I believe it uses the full edi to increase due to performance reasons as increasing an int would be faster than increasing only a word, though I'm not sure. I'm no expert.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  14. #29
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,308
    There's no need to simply rely on observed behaviour. Just write it properly to begin with. Here is how to do it, which I meant to write this morning when I was in a mad rush:
    Code:
    template<typename T> class specialCharTraits {};
    template<> struct specialCharTraits<char> {
    	static char zero() { return '0'; }
    };
    template<> struct specialCharTraits<wchar_t> {
    	static wchar_t zero() { return L'0'; }
    };
    Then do this where you use it:
    Code:
    if (w == specialCharTrait<T>::zero()) cout << "YES! w == '0'!\n";
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Page 2 of 2 FirstFirst 12
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. A development process
    By Noir in forum C Programming
    Replies: 37
    Last Post: 07-10-2011, 11:39 PM
  2. Replies: 10
    Last Post: 07-10-2008, 04:45 PM
  3. Comparing characters
    By Thuz in forum C Programming
    Replies: 2
    Last Post: 09-16-2007, 01:07 PM
  4. help with text input
    By Alphawaves in forum C Programming
    Replies: 8
    Last Post: 04-08-2007, 05:54 PM
  5. Comparing Characters
    By luckygold6 in forum C++ Programming
    Replies: 6
    Last Post: 03-12-2003, 08:19 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21