MultiByteToWideChar

This is a discussion on MultiByteToWideChar within the Windows Programming forums, part of the Platform Specific Boards category; i need to convert UTF-8 file to unicode and display it on screen. i know MultiByteToWideChar can do such conversion ...

  1. #1
    Registered User
    Join Date
    Mar 2004
    Posts
    36

    MultiByteToWideChar

    i need to convert UTF-8 file to unicode and display it on screen.
    i know MultiByteToWideChar can do such conversion

    lets say i write:


    Code:
    CHAR buf[200]; 
    
    int num=MultiByteToWideChar(CP_UTF8, 0, buf, sizeof(buf), NULL, 0);
    after the call num returns number of TCHAR that contained in buf variable.
    now,
    buf is 200 bytes long.
    so it is possible that MultiByteToWideChar will not convert some of the last bytes of buf because they won't be a valid UTF-8 sequence

    so my question is
    how can i know the size of buf so it will fit exactly integral(whole) number of multi-byte characters?

    thanks

  2. #2
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    You use strlen() or lstrlenA to find the length of the string contained in buf. The sizeof buf is not relevant. Alternatively, with MultiByteToWideChar you can use -1 to indicate the string is null terminated.

    Code:
    CHAR buf[200] = "This is some UTF8 string.";
    
    int num=MultiByteToWideChar(CP_UTF8, 0, buf, strlen(buf) + 1, NULL, 0);
    or
    Code:
    CHAR buf[200] = "This is some UTF8 string.";
    
    int num=MultiByteToWideChar(CP_UTF8, 0, buf, -1, NULL, 0);
    Last edited by anonytmouse; 07-21-2004 at 08:38 PM.

  3. #3
    Registered User
    Join Date
    Mar 2004
    Posts
    36
    ok but it is not what i'm asking.

    i'm reading multi-byte string. each character in this string encoded with different number of bytes.

    lets say buf is 11 bytes long.
    the text in file consists of 10 characters: first 9 are english and the last is russian. UTF-8 will assign 1 byte for each english character and 2+ bytes for russian character.

    so in this particular example,
    do you see the problem i'm talking about?
    (buf has place for all english characters but has space only for maybe half of the russian character. so i can't write CHAR buf[11])

    (by the way, in the example file only 11 bytes long, but actually it can be 500kb, so reading entire text into memory is not very practical)

    hope that more clear
    Last edited by Jumper; 07-22-2004 at 04:41 AM.

Popular pages Recent additions subscribe to a feed

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21