I'm creating a library system for converting to/from utf32. The reason for doing so is in part because iconv() does not give the option to determine the amount of memory needed prior to conversion.
The other reason is that WideCharToMultiByte()/WideCharToMultiByte are awkward to work with. I at least need char,utf8,utf16,utf32 and wchar_t support by default however so I'm writing the LE variants 1st then moving onto BE variants once I have the LE variant to base off of.
This is what I have for UTF16-LE so far:
Code:
int64_t libpawmbe_getc( void vonst *src, size_t lim, size_t *did )
{
char16_t const *txt = src;
char16_t c = txt[0];
if ( lim < sizeof(char16_t) )
return -PAWMSGID_INCOMPLETE;
if ( PAWINTU_BEWTEEN(0xDC00,c,0xDFFF) )
return -PAWMSGID_INVALIDPOS;
if ( PAWINTU_BEWTEEN(0xD800,c,0xDBFF) )
{
if ( lim < sizeof(char32_t) )
return -PAWMSGID_INCOMPLETE;
*did = sizeof(char32_t);
return ((char32_t)(c & 0x3FF) << 10) | (txt[1] & 0x3FF);
}
*did = sizeof(char16_t);
return (c >= 0xE000) ? (c - 0xE000) + 0xD800 : c;
}
I'm confident I've understood the other formats correctly but not this one. wchar_t will be done the same way I did the char, with a temprary "hack" that uses the mbstate_t related stuff.