Thread: Reading UNICODE into an edit

  1. #1
    Registered User
    Join Date
    Sep 2004
    Posts
    31

    Reading UNICODE into an edit

    I'm a n00b to unicode so go easy on me, I'm also the type who likes to code first and read documentation later.

    I'm trying to read a unicode file, and I've pretty much got it working. But there is a bit that I'm not quite certain about, so here we go.

    I'm opening and reading the file like this:

    In summary of the below variables, the important ones:
    char buff[] - contains the file, read from readFile()
    char mbbuff[] - contains result of WideCharToMultiByte()

    Code:
    HANDLE file = CreateFile(
                       "eng.utxt",
                       GENERIC_READ,
                       0,
                       NULL,
                       OPEN_EXISTING,
                       FILE_ATTRIBUTE_NORMAL,
                       NULL
                );
                if( file == INVALID_HANDLE_VALUE )
                    MessageBox( NULL, "Failed to open eng.utxt!", "Error", MB_OK | MB_ICONERROR );
                DWORD SIZE = GetFileSize( file, NULL );//40000;
                WCHAR buff[SIZE];
                DWORD numread;
                bool rf = ReadFile(
                     file,
                     buff,
                     SIZE,
                     &numread,
                     NULL
                );
                if( !rf )
                    MessageBox( NULL, "ReadFile() Failed!", "Error", MB_OK | MB_ICONERROR );
                CloseHandle( file );
                char mbbuff[SIZE];
                WideCharToMultiByte(
                     CP_UTF8,
                     0,
                     buff,
                     SIZE,
                     mbbuff,
                     SIZE,
                     NULL,
                     NULL
                );
    And when all is said and done, I just set the edit text to mbbuff[]. I'm assuming it's just me not setting the correct properties for the control because I'm getting weird chars and stuff everywhere. Including some that usually appear when you try to write from a buffer that has no text or whatever so I've got a bit of a memory problem as well I'm afraid.

  2. #2
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    If the file is in Windows compatible unicode (UTF16-LE), and you are using NT/2000/XP, you don't need to perform a conversion. Just send the unicode text to the edit control with the SetWindowTextW function. Also, make sure the buffer is nul terminated before use.
    Code:
                WCHAR buff[SIZE + 1];
                DWORD numread;
                bool rf = ReadFile(
                     file,
                     buff,
                     SIZE,
                     &numread,
                     NULL
                );
                buff[numread / sizeof(WCHAR)] = L'\0';
                SetWindowTextW(hwndEdit, buff);
    You should be aware that variable sized arrays are a feature of C99 which is not implemented by all compilers, notably including MS Visual C++. If you want your code to be portable to other windows compilers, you should use malloc instead.

    If your file is actually UTF8, you can convert it into unicode using the MultiByteToWideChar function. If this is the case, you may be interested in this post.

    You can convert Windows unicode to the current multi-byte character set (typically ANSI on English language versions of Windows) with the WideCharToMultiByte and CP_ACP as the first argument.
    Last edited by anonytmouse; 07-05-2005 at 12:04 AM. Reason: Byte/WCHAR mismatch

  3. #3
    Registered User
    Join Date
    Sep 2004
    Posts
    31
    I see. Well I tried your method for UTF16-LE and it seems to work. Except I am getting those little boxes [] where if I open in openoffice.org I'd be getting # (hash).. And now the junk at the beginning of the string is now a single ? character instead of a lot of other characters.

  4. #4
    Tropical Coder Darryl's Avatar
    Join Date
    Mar 2005
    Location
    Cayman Islands
    Posts
    503
    I think you should be using wchar for your buffers and not just char, or maybe better tchar, which allows for easy transitions for ansi to unicode and vice-versa

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 05-31-2005, 03:02 PM
  2. edit controls
    By ZerOrDie in forum Windows Programming
    Replies: 11
    Last Post: 04-08-2003, 12:09 PM
  3. Difficulty superclassing EDIT window class
    By cDir in forum Windows Programming
    Replies: 7
    Last Post: 02-21-2002, 05:06 PM