Thread: Making a unicode text file

  1. #1
    Registered User Tonto's Avatar
    Join Date
    Jun 2005
    Location
    New York
    Posts
    1,465

    Making a unicode text file

    Code:
    #include <iostream>
    #include <string>
    #include <fstream>
    #include <sstream>
    
    
    int main(int argc, char ** argv)
    {
    	using namespace std;
    	
    	wifstream in(argv[1]);
    	wofstream out((std::string(argv[1]) + ".xxx").c_str());
    	wstring ws, e;
    
    	wchar_t x[1];
    	
    	in.read(x, 1);
    	out.write(x, 1);
    
    	while(getline(in, ws))
    	{
    		wstringstream ww(ws);
    
    		while(getline(ww, e, L'\t'))
    		{
    			out << e << endl;
    		}
    	}
    }
    The file that I produce is not read properly by a text editor (notepad, vim, whatever) while the other that I read is. I have exsamined the binary of the file and the BOM is the same, as is the format of the unicode character data. What should I do

  2. #2
    Registered User Tonto's Avatar
    Join Date
    Jun 2005
    Location
    New York
    Posts
    1,465
    Attach'd is the generated and the test file

  3. #3
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    Just curious, why are you using read and write? Why not get()?

  4. #4
    Registered User Tonto's Avatar
    Join Date
    Jun 2005
    Location
    New York
    Posts
    1,465
    The generated file appears to have an EOF marker and the other doesn't. Curious

    >> Just curious, why are you using read and write? Why not get()?

    I'unno.

    get/put
    read/write
    <</>>
    Last edited by Tonto; 02-15-2008 at 06:16 PM.

  5. #5
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    The wide file streams perform a pretty much implemention-defined conversion on I/O. It depends mostly on your locale. If you're using a Windows ANSI locale, which you most likely are, it will convert the internal UTF-16 to Windows-1252 on writing. Or if you're on Linux, the UTF-32 to ISO-8859-1, or perhaps UTF-8 if you're using a UTF-8 locale.

    All this adds up to the fact that C++'s character handling is quite useless.

    You could write a UTF-16 codecvt facet.
    Last edited by CornedBee; 02-15-2008 at 07:53 PM.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  6. #6
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Didn't realize you started another thread on the subject....

    http://cboard.cprogramming.com/showp...63&postcount=8

    When I wrote that, I was only thinking of windows and MSVC where wchar_t is 2 bytes and wide string literals are UTF-16LE. I keep forgetting sizeof(wchar_t) and character encodings for string literals are implementation defined

    gg

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 12:36 PM
  2. Replies: 3
    Last Post: 03-04-2005, 02:46 PM
  3. struct question
    By caduardo21 in forum Windows Programming
    Replies: 5
    Last Post: 01-31-2005, 04:49 PM
  4. Simple File encryption
    By caroundw5h in forum C Programming
    Replies: 2
    Last Post: 10-13-2004, 10:51 PM
  5. Outputting String arrays in windows
    By Xterria in forum Game Programming
    Replies: 11
    Last Post: 11-13-2001, 07:35 PM