Comparing characters

**Elysia** · 01-12-2008

Originally Posted by cpjust

Actually, wouldn't it be more like this:

Code:

#ifdef UNICODE
#define _T( x )   L ## x
#else
#define _T( x )   x
#endif

Oops! Yes, you're right.

Originally Posted by ZuK

Both char and wchar_t are integeral types. There is never a cast needed to compare them. At best you will get a warning if the comparison might not work as expected (e.g. signed <> unsigned ).
Kurt

Which means no cast will be required at all. You can simply compare to a character without trouble, so there's no need for any fancy work here for that function.

**Banana Man** · 01-12-2008

Thanks.

**Daved** · 01-12-2008

>> Both ifs evaluate as true.

But that code is not relevant to the question. The question is whether the compare will work as intended in all cases. Can you say for sure that there isn't any other wchar_t value that when compared against the char '0' will be evaluate to true? That's the question.

**Elysia** · 01-13-2008

I don't think so.
The compiler treats the wchar_t as a word so it will compare a word against '0'.
If it compared only the high or low byte of the value, then we could get false positives, but from looking at the assembly, it doesn't appear to be so.
But I can do a test.

UPDATE:

Code:

	wchar_t w = 0;
	for (int i = 0; i <= 0xFFFF; i++, w++)
	{
		if (w == '0') cout << "YES! w == '0'!\n";
	}

Generates one output saying the if is true. So it's safe.

**ZuK** · 01-13-2008

Originally Posted by Elysia

I don't think so.
The compiler treats the wchar_t as a word so it will compare a word against '0'.
If it compared only the high or low byte of the value, then we could get false positives, but from looking at the assembly, it doesn't appear to be so.
But I can do a test.

This is not the question.
The question is. Has the multibyte char L'0' the same value as the char '0' ?
Kurt

**Elysia** · 01-13-2008

I thought I just busted that. There is only one value in wchar_t's range (word) that is equal to '0' and that is 0x30, the same as L'0' which is 0x0030.

**brewbuck** · 01-13-2008

Assuming that the wide char encoding and the non-wide char encoding are equivalent enough to directly compare character values is probably unsafe.

**brewbuck** · 01-13-2008

Originally Posted by Elysia

I thought I just busted that. There is only one value in wchar_t's range (word) that is equal to '0' and that is 0x30, the same as L'0' which is 0x0030.

"Wide" doesn't have to mean "Unicode." This is an unsafe assumption.

**Daved** · 01-13-2008

>> The question is. Has the multibyte char L'0' the same value as the char '0' ?
That is not my question. My question is whether there are any other wchar_t values that when compared to '0' will return true. Will the wchar_t be converted to char and lose some of its information and become '0'.

**iMalc** · 01-13-2008

Just don't try comparing chars to wchar_ts. Create a traits class with explicit specialisation for char and wchar_t
Then do:

Code:

if (w == specialCharTrait<T>zero())

**Elysia** · 01-13-2008

So for a new test I tried,

Code:

	wchar_t w = 0;
	for (int i = 0; i <= 0xFFFF; i++, w++)
	{
		if (w == '0' && w == L'0') cout << "YES! w == '0' && w == L'0'!\n";
		else if (w == '0') cout << "YES! w == '0'!\n";
		else if (w == L'0') cout << "AND YES! w == L'0'!\n";
	}
	unsigned char c = 0;
	for (int i = 0; i < 0xFF; i++, c++)
	{
		if (c == '0' && w == L'0') cout << "YES! w == '0' && w == L'0'!\n";
		else if (c == '0') cout << "YES! w == '0'!\n";
		else if (c == L'0') cout << "AND YES! w == L'0'!\n";
	}

And it outputs
YES! w == '0' && w == L'0'!
YES! w == '0'!

And you don't need to worry about the wchar_t getting implicitly cast to a char. It's not, according to the assembly:

Code:

00401DB0  cmp         di,30h 
00401DB4  jne         wmain+8Ah (401DCAh) 
00401DB6  mov         ecx,dword ptr [__imp_std::cout (403050h)] 
00401DBC  push        4040C0h 
00401DC1  push        ecx  
00401DC2  call        std::operator<<<std::char_traits<char> > (401000h) 
00401DC7  add         esp,8 
00401DCA  inc         edi  
00401DCB  sub         ebx,1 
00401DCE  jne         wmain+70h (401DB0h)

So basically, it stores the data in edi and uses the low-order word di to compare.
The char loop is even easier:

Code:

00401DD7  cmp         bl,30h 
00401DDA  jne         wmain+0C3h (401E03h)
00401DDC  cmp         di,30h 
00401DE0  jne         wmain+0B0h (401DF0h) 
00401DE2  mov         edx,dword ptr [__imp_std::cout (403050h)] 
00401DE8  push        404108h 
00401DED  push        edx  
00401DEE  jmp         wmain+0BBh (401DFBh) 
		else if (c == '0') cout << "YES! w == '0'!\n";
00401DF0  mov         eax,dword ptr [__imp_std::cout (403050h)] 
00401DF5  push        404128h 
00401DFA  push        eax  
00401DFB  call        std::operator<<<std::char_traits<char> > (401000h) 
00401E00  add         esp,8 
00401E03  inc         bl   
00401E05  sub         ebp,1 
00401E08  jne         wmain+97h (401DD7h)

**Daved** · 01-13-2008

>> And you don't need to worry about the wchar_t getting implicitly cast to a char. It's not, according to the assembly

How does the assembly output of one compiler on a specific platform answer the general question? The whole point is whether you can guarantee it will be safe and there won't be any false positives, so the fact that it seems to work in one instance doesn't help much.

>> So basically, it stores the data in edi and uses the low-order word di to compare.

Isn't that the same as converting to char? What if '0' has different data in the high-order word than the wchar_t character you are comparing? They would compare as equal when they are not.

**Elysia** · 01-13-2008

Originally Posted by Daved

>> And you don't need to worry about the wchar_t getting implicitly cast to a char. It's not, according to the assembly

How does the assembly output of one compiler on a specific platform answer the general question? The whole point is whether you can guarantee it will be safe and there won't be any false positives, so the fact that it seems to work in one instance doesn't help much.

You're free to do your own tests, of course.
My tests show it's safe on Visual Studio 2008. More than that I cannot gaurantee unless it's mentioned in the standard.

>> So basically, it stores the data in edi and uses the low-order word di to compare.

Isn't that the same as converting to char? What if '0' has different data in the high-order word than the wchar_t character you are comparing? They would compare as equal when they are not.

wchar_t is 2 bytes (so a word) on Visual Studio, so the assembly is perfectly fine.
I believe it uses the full edi to increase due to performance reasons as increasing an int would be faster than increasing only a word, though I'm not sure. I'm no expert.

**iMalc** · 01-13-2008

There's no need to simply rely on observed behaviour. Just write it properly to begin with. Here is how to do it, which I meant to write this morning when I was in a mad rush:

Code:

template<typename T> class specialCharTraits {};
template<> struct specialCharTraits<char> {
	static char zero() { return '0'; }
};
template<> struct specialCharTraits<wchar_t> {
	static wchar_t zero() { return L'0'; }
};

Then do this where you use it:

Code:

if (w == specialCharTrait<T>::zero()) cout << "YES! w == '0'!\n";

Thread: Comparing characters

Thread Tools

Search Thread

Display

Similar Threads

A development process

[URGENT] Getting warning: null character(s) ignored repeatedly

Comparing characters

help with text input

Comparing Characters