To make my library both portable & future proof (in the event more than 32bits are ever needed) I'm defining my own character encoding to be using in runtime only, not in files or network protocols, for those I'm making a converter for the current common standards of UTF8,UTF16 & UTF32. At the moment I'm doing the UTF8 conversion and am struggling to get the gears in my head moving on working out the last bitwise shifting part of the character.
My format is under the definition of:
1x... means read the next character as the bottom part of this character, x... always means unicode point, so 2 16 bit characters of mine converted to a 28bit UTF32 character would wind up looking like:
Code:
char32_t c32 = vc & PAWVC_BOTTOM;
c32 <<= PAWVC_WIDTH;
c32 |= (vc & PAWVC_BOTTOM);
The character is defined to always be at least 16 bits via short or long (a non-conforming system of CHAR_BIT = 4 would always result in a long of 16 bits wide) thus ensuring L"" would be the only valid way to assign a string literal to it, someone mind taking a look at the code below and helping me fix the last bit to extract the last applicable bits.
Code:
one = *src & PAWVC_BOTTOM;
two = src[1];
C = dst + n;
if ( bits > 18 )
{
i += 2;
if ( bits <= PAWL16D_WIDTH - 1 )
{
C[3] = 0x80 | (one & 0x3F);
C[2] = 0x80 | ((one >> 6) & 0x3F);
C[1] = 0x80 | ((one >> 12) & 0x3F);
C[0] = 0xF0 | ((one >> 18) & 07);
continue;
}
C[3] = 0x80 | (two & 0x3F);
C[2] = 0x80 | ((two >> 6) & 0x3F);
C[1] = 0x80 | ((two >> 12) & 0x3F);
C[0] = 0xF0 | (two >> 18);
#if PAWL16D_WIDTH - 12 >= 6
C[0] |= one & ~(-1 << (PAWL16D_WIDTH - 18));
#else
c[1] |= (one & ~(-1 << (PAWL16D_WIDTH - 12));
#endif
}