Thread: Unicode File I/O

  1. #1
    Registered User
    Join Date
    Nov 2005
    Posts
    88

    Unicode File I/O

    I am trying to write a wstring and wchar_t array to file using wofstream. My wstring is populated with a unicode string I obtained from a registry key name(using RegEnumKeyW). It contains a non-ascii character that is displayed as a square block in the registry. I display the wstring with MessageBoxW to see that it contains the same text as the registry. It does.

    However, when I write the wstring to file using wofstream and read it immediately back into another wstring using wifstream, the strings differ. When I look inside the file that was written, it does not contain the proper characters. It has ascii characters where the unicode characters should be. It may be it is downcasting the unicode character to its ascii value and then writing it to file, but this is defintely unexpected given wofstream is designed for wchar_t output.

    Do I need to set some locale for it to work right? I have no experience with it, so I am unsure.

    I am on Windows XP using MSVC++ 6.0 SP6. I am using the STLport stl because wofstream has a bug with MSVC++.

    Thanks for any help you can give.

    Joe

  2. #2
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Try the tutorial here (which is first on the list for google).
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  3. #3
    Registered User
    Join Date
    Nov 2005
    Posts
    88
    Thank you for the suggestion, but I was not able to find anything in that tutorial that addressed my question. Perhaps I am missing something, but I was not able to find it. I did read the Unicode tutorial. However, it does not address how to write wchar_t to file using wofstream. The closest it comes to mentioning this is to suggest that wchar_t should not be written out. However, if you are writing a file and reading a file with wofstream, the reasons they mentioned for not writing it to file should not be an issue as I understand it. Perhaps I am wrong, but it seems like the wofstream would not be provided if it did not at least work.

    Thanks again. Maybe you have something more specific that could help with the issue of wofstream write out and if specific locales are necessary to write out properly?

    Joe

  4. #4
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    I recommend using ofstream (not wofstream) and writing it in binary. How exactly wofstream reads/writes characters is not always that good. In VC++ that string is converted back to a multibyte code page when it's written. E.g. writing a wstring (UTF-16) with japanese characters on my system will result in SJIS encoded text.

    I'd recommend this method to write:
    Code:
    std::ofstream outFile("filename.dat", std::ios::out | std::ios::binary);
    outfile.write((char *) wstr.c_str(), wstr.length() * sizeof(wchar_t));

    Oh, also keep in mind, if you're trying to write a text file for an editor like notepad, it expects the first character in the file to be a unicode byte-order marker (0xFEFF). So to write that:

    Code:
    wchar_t BOM = 0xFEFF;
    std::ofstream outFile("filename.dat", std::ios::out | std::ios::binary);
    outfile.write((char *) &BOM,sizeof(wchar_t));
    Last edited by Cat; 11-26-2005 at 01:58 AM.
    You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.

  5. #5
    Registered User
    Join Date
    Nov 2005
    Posts
    88
    Thank you for the suggestion cat. Part of the reason I do not want to write it out in binary is because I am writing a log file. Will notepad be able to open the binary file and display the textual data meaningfully? Also, binary write becomes kind of cumbersome for text output with strings. The technique does work though. I wrote an encrypted stream that accepts wstrings and it properly stores and loads the data. However, this approach obviously would not work for a log file.

    Thank you again for the suggestion. Is there anyway to keep it from converting it back to the multibyte code page? Is this atypical behavior for a compiler?

  6. #6
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    This is what MSDN says, although you are using STLPort, so it may exhibit different behaviour.
    Quote Originally Posted by MSDN
    basic_ofstream

    When the wchar_t specialization of basic_ofstream writes to the file, if the file is opened in text mode it will write a MBCS sequence. If the file is opened in binary mode, it will write a UCS-16 sequence (two-byte Unicode sequence) to the file. Regardless of the mode the file is opened in, the internal representation will use a buffer of wchar_t characters.
    I've found the C and C++ libraries to be of limited value when dealing with unicode. Does wofsteam use a ansi filename? If it does, it seems of very little value, on one hand using unicode in the file but on the other breaking unicode filenames! I would use the Windows file functions to read and write to file.

    >> Will notepad be able to open the binary file and display the textual data meaningfully? <<

    The only change that using text-mode output does it to convert "\n" to "\r\n". You can do this yourself, in your file output wrapper function, or explicitly use "\r\n" in place of "\n".

  7. #7
    Registered User
    Join Date
    Nov 2005
    Posts
    88
    I used WriteFile to make my own WOFStream. It seems to be pretty successful so far. Do you know of a good buffer size for file I/O? I currently was using either 1024 bytes or 4096 bytes. Internally the stream buffers with a wstring (I am making the assumption that their deallocation and allocation is written more efficiently than if I wrote my own wchar_t dynamic buffers).

    Thank you for your help and suggestions everyone.

    (Also, I noticed my stream was significantly faster than VC++ wofstream(in Debug Multithreaded Library. It is also marginally slower than the STLport implementation. Is there any reason for the drastic difference in speed? Does wofstream verify that the bytes are written out correctly, or offer any other additional functionality that causes it to perform so slowly?)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Newbie homework help
    By fossage in forum C Programming
    Replies: 3
    Last Post: 04-30-2009, 04:27 PM
  2. File transfer- the file sometimes not full transferred
    By shu_fei86 in forum C# Programming
    Replies: 13
    Last Post: 03-13-2009, 12:44 PM
  3. Subtle(?) File I/O Problem
    By cecomp64 in forum C Programming
    Replies: 9
    Last Post: 07-16-2008, 11:39 AM
  4. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM
  5. UNICODE and GET_STATE
    By Registered in forum C++ Programming
    Replies: 1
    Last Post: 07-15-2002, 03:23 PM