Well this kinda works, but the problem is that the output is fine even though the current locale (an ASCII locale like POSIX or C) is one that does not have most of the characters in the file. If setlocale is implemented correctly, it shouldn't go ahead and print those unicode characters when the locale is not UTF-8. If I open the same file in VIM with a POSIX locale, the file looks completely different than when using an UTF-8 locale. The output stays the same even though I remove setlocale() by the way, which leads me to my next question...
But that is expected since, and if I understand it correctly, fgets/printf only reads/prints a stream of n bytes, and does not have any concept of any encodings or multibytes characters. So %.10s with printf on multibyte strings will not work as expected and print 10 unicode characters (instead it should print 10 bytes). Well I tried this on the code below, and changed printf("%s", buf) to printf("%.80s", buf), and guess what? It still looks exactly the same, so that means that printf IS AWARE of multibyte strings somehow, because 80 unicode characters are printed, instead of 80 bytes. The real size of every line is in the range [80 - 80*MB_LEN_MAX], so there's no way that should work if printf only counts bytes. And how the heck does printf know that the buf is composed of multibyte UTF-8 characters anyway?
Code:
#include <errno.h>
#include <locale.h>
#include <stdio.h>
int main()
{
const char *filename = "UTF-8-demo.txt";
char buf[80];
FILE *fp;
if (!setlocale(LC_CTYPE, ""))
{
fprintf(stderr, "Failed to set the specified locale\n");
return 1;
}
fp = fopen(filename, "r");
if (!fp)
{
perror(filename);
return 1;
}
while (fgets(buf, 80, fp))
printf("%s", buf);
return 0;
}