Thread: Unicode file I/O functions

  1. #1
    Registered User
    Join Date
    Oct 2005
    Posts
    13

    Unicode file I/O functions

    When using TCHAR there is generic data types tchar, generic string functions like _tcs and also _T/_TEXT. Are there any generic file I/O functions that switches between fopen/fwopen fwrite/fwwrite etc...based on UNICODE preprocessor definition?

    In addition, a related question is when using swprintf or wprintf or fwprintf, how does one address the format specifier like "%s"? I do not see a generic mapping layer like _T for this. How is this problem addressed?

    MC
    Last edited by MiamiCuse; 11-03-2005 at 12:39 PM.

  2. #2
    Registered User
    Join Date
    Aug 2005
    Posts
    1,267
    fread() and fwrite() do not have UNICODE versions because both functions work with binary data which is not affected by the UNICODE setting or whether the project is compiled for UNICODE or not. The only reason to compile for UNICODE is for text string handling functions or functions that take strings or character arrays as an argument, other binary data is not affected.

  3. #3
    Registered User
    Join Date
    Oct 2005
    Posts
    13
    Thank you Ancient Dragon...I see what you are talking about. The data structure I have which contains strings will be either wchar_t or char depending on whether UNICODE is defined, so it works itself out when using fwrite. Now, having said that, the non-UNICODE version of the application (let's call it wAPP) will thus produce a data file (let's call it wFILE) different from that (let's call it FILE) of the UNICODE version (let call this APP), because of the data structure difference.

    This means if I used APP to open wFILE or wAPP to open FILE, I will be in trouble yes? Although I do not see this happening practically, but is there a way to avoid this situation? Is there a way to open a file and find out if it's written one way versus another WITHOUT logic in the application code to add some sort of a version marker in it? I would prefer if there is a way to preserve the current legacy format of FILE and not have to promote the file format just because we will be building a UNICODE version.

    Now another question would be even for binary file if I have a structure like:
    Code:
    struct person
    {
       TCHAR name[256];
       ID long;
       double height;
    };
    and I write a bunch of this into the file using fwrite, will the name be encoded differently in the file that is sensitive to the locale settings or no?

    Thanks,

    MC

  4. #4
    Registered User
    Join Date
    Aug 2005
    Posts
    1,267
    1. I know of no way to detect the difference between UNICODE and non-UNICODE binary files.

    2. depends on the operating system. MS-Windows wchar_t is defined as unsigned short (2 bytes) and on *nix it is unsigned long (4 bytes), which complicates porting the data files even more difficult. The last time I looked up the UNICODE standards , the standard committee was considering increasing it again to 8 byte integer so that it can accommodate large graphic characters often used in Chinese and similar language. That means, the name field in your structure could occupy 2, 4, or 8 bytes, depending on the compiler and operating sytem. This is a compile-time feature and unaffected by locale settings.

    Here are some suggestions for forcing all caracters into one 8-bit byte. I haven't used them myself, so encoding them might be troublesome and slow.
    Last edited by Ancient Dragon; 11-03-2005 at 03:00 PM.

  5. #5
    Registered Luser cwr's Avatar
    Join Date
    Jul 2005
    Location
    Sydney, Australia
    Posts
    869
    The truly portable method is to write everything in text format.

    See also http://www.eskimo.com/~scs/C-faq/q20.5.html.

  6. #6
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Is there a way to open a file and find out if it's written one way versus another WITHOUT logic in the application
    No, there isn't. A simple way to get around your problem is to make the first 4 bytes of a file an indicator which tells the application how to read in the data (as unicode, or ASCII, etc).

    In addition, a related question is when using swprintf or wprintf or fwprintf, how does one address the format specifier like "%s"? I do not see a generic mapping layer like _T for this. How is this problem addressed?
    Use _stprintf() or _sntprintf()

  7. #7
    Registered User
    Join Date
    Aug 2005
    Posts
    1,267
    n addition, a related question is when using swprintf or wprintf or fwprintf, how does one address the format specifier like "%s"?
    See format specification %s or %S (note capatilization)
    Last edited by Ancient Dragon; 11-03-2005 at 08:27 PM.

  8. #8
    Registered User
    Join Date
    Oct 2005
    Posts
    13
    so it seems using %s will do for both non-UNICODE and UNICODE variations as long as the generic **printf functions are used. Correct?

    Thanks,

    MC

  9. #9
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    correct

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 12:36 PM
  2. Inventory records
    By jsbeckton in forum C Programming
    Replies: 23
    Last Post: 06-28-2007, 04:14 AM
  3. Post...
    By maxorator in forum C++ Programming
    Replies: 12
    Last Post: 10-11-2005, 08:39 AM
  4. Replies: 4
    Last Post: 10-21-2003, 04:28 PM
  5. simulate Grep command in Unix using C
    By laxmi in forum C Programming
    Replies: 6
    Last Post: 05-10-2002, 04:10 PM