I'm working on a program that will conjugate words in different languages. My host language is currently Quenya. One of the features that I am trying to tackle is handling special characters like accented vowels and vowels with diaeresis, changing the vowels to vowels without accents/diaeresis (or the other way around).
Below is a sample of code that I am using as an initial test of this concept. I am attempting to isolate the letter "ë" in "quentë" for the purpose of printing at this time. Once I succeed I will attempt to evaluate the existence of "ë" in a different word, replacing it with an "e".
Code:
#include <stdio.h>
#include <locale.h>
#include <string.h>
int main()
{
if (!setlocale(LC_CTYPE, "")) {
fprintf(stderr, "Can't set the specified locale! "
"Check LANG, LC_CTYPE, LC_ALL.\n");
return 1;
}
wchar_t *wstring;
char *string = "quentë";
char newstring[10];
printf("the word is %s\n", string);
int len = mbstowcs(NULL,string,0);
printf("mbs string size is %d\n", len); //this shows the correct # of characters.
mbstowcs(wstring,string,len);
int len2 = wcstombs(newstring,wstring,len+1);
printf("the word is now %s\n", newstring);
printf("the last character is: %c\n", newstring[5]);
return 0;
}
Now the output is:
Code:
the word is quentë
mbs string size is 6
the word is now quentë
the last character is:
For some reason the final character will not print. I've tried the other characters with the same code, and they print fine. I've tried finding any information on the subject of UTF-8 strings in C programming, Linux, GCC, GLIBC, you name it. I just can't seem to nail this concept down. Any ideas?