-
MultiByteToWideChar
i need to convert UTF-8 file to unicode and display it on screen.
i know MultiByteToWideChar can do such conversion
lets say i write:
Code:
CHAR buf[200];
int num=MultiByteToWideChar(CP_UTF8, 0, buf, sizeof(buf), NULL, 0);
after the call num returns number of TCHAR that contained in buf variable.
now,
buf is 200 bytes long.
so it is possible that MultiByteToWideChar will not convert some of the last bytes of buf because they won't be a valid UTF-8 sequence
so my question is
how can i know the size of buf so it will fit exactly integral(whole) number of multi-byte characters?
thanks
-
You use strlen() or lstrlenA to find the length of the string contained in buf. The sizeof buf is not relevant. Alternatively, with MultiByteToWideChar you can use -1 to indicate the string is null terminated.
Code:
CHAR buf[200] = "This is some UTF8 string.";
int num=MultiByteToWideChar(CP_UTF8, 0, buf, strlen(buf) + 1, NULL, 0);
or
Code:
CHAR buf[200] = "This is some UTF8 string.";
int num=MultiByteToWideChar(CP_UTF8, 0, buf, -1, NULL, 0);
-
ok but it is not what i'm asking.
i'm reading multi-byte string. each character in this string encoded with different number of bytes.
lets say buf is 11 bytes long.
the text in file consists of 10 characters: first 9 are english and the last is russian. UTF-8 will assign 1 byte for each english character and 2+ bytes for russian character.
so in this particular example,
do you see the problem i'm talking about?
(buf has place for all english characters but has space only for maybe half of the russian character. so i can't write CHAR buf[11])
(by the way, in the example file only 11 bytes long, but actually it can be 500kb, so reading entire text into memory is not very practical)
hope that more clear