Hello ladies and gentlemen,
I've trying to achieve palindrome detection on an UTF8 encoded file. A palindrome is a sentence which mirror itself, for example "kayak" or "radar".
While it has been a piece of cake in C#, I'm struggling with its C version because of UTF8 encoding.
I'm reading and stocking each line in file using :
Code:
while (fgets(line, sizeof(line), file)) {
if(line != NULL)
{
lines[nb_lines] = (char*)malloc(MAX_LINE_LENGTH*sizeof(char));
strcpy(lines[nb_lines],line);
if(++nb_lines == lines_alloc) { //pre increment needed!
lines_alloc *= 2;
lines = (char**)realloc(lines,lines_alloc*sizeof(char*));
}
}
}
So far, not sure if properly done, but I can display each line just fine.
The problematic function : I'm trying to remove diacritics using :
Code:
char toNonDiacritic(char c){
switch(c)
{
case 'à':
case 'â':
case 'ä':
case 'ã':
return 'a';
break;
//etc for e,u,i,o
}
return c;
}
This method does NOT work, I see weird characters in output terminal while displaying characters with supposedly removed diacritics.
Important informations : Both file and my terminal are in UTF8.
What I think I understood so far : UTF8 use characters coded in 1 to 4 bytes while a "char" is 1 byte. So my problematics characters might be coded in more than 1 byte, making toNonDiacritic inefficient.
What I can't figure out : How to have a simple way to turn a multibyte char to something which can be compaired in a switch. So far I tried with wchar_t structure by using mbstowcs, without success (display the result either with printf or wprintf, with or without L give me things like :
No├½l a trop par rapport a L├®on.
(Original : Noël a trop par rapport a Léon)
Any idea ?