Originally Posted by
Codeplug
According to GNU LibC manual, strcasestr() is locale dependent. So if you haven't loaded a UTF8 locale, then it makes no sense to pass a UTF8 string to strcasestr(). As you already mentioned, a "character" in UTF8 can be encoded with multiple code points (bytes in this case). The "case" in strcasestr() applies to "character" case. So the function needs to know how the characters are encoded.
Well, in the test cases the utf-8 characters were not members of a non-ascii alphabet, they were things like the apostraphe, which does not have an ascii value (only the single quote ' and the backtick ` do) and so is a multi-byte. But since they could be, I'm going to have to do some research.
>>
I'm guessing this is some two's compliment related issue again
No, the value of a pointer is the same regardless of the type to which it points (signed or unsigned).
gg
Hopefully someone will explain it tho -- consider this:
Code:
#include <string.h>
#include <stdio.h>
int main() {
const char haystack[]="this and that", needle[]="and";
char *ptr=strstr(haystack,needle);
printf("%p %s\n",ptr,ptr);
ptr=strcasestr(haystack,needle);
printf("%p %s\n",ptr,ptr);
return 0;
}
You would expect the results to be the same. However, yesterday on large files the pointer address was different, but the string it pointed to was the same!
Today, the code above is actually seg faulting for me at the second assignment of ptr. So much for strcasestr!
I imagine this relates to the warning from gcc:
test.c: In function ‘main’:
test.c:8: warning: assignment makes pointer from integer without a cast
However, in the GNU manual it says of strcasestr:
This is like strstr, except that it ignores case in searching for the substring.
So why a warning for one but not the other?