Thread: Creating a new file with multibyte japanese string in C

  1. #1
    Registered User
    Join Date
    Oct 2020
    Posts
    4

    Creating a new file with multibyte japanese string in C

    I am trying to create a new file with name "履歴書.txt" using C language's fopen(). I am using Windows 10 English version.

    File gets created but file name is always garbled.

    Code:
    setlocale(LC_ALL, "ja_JP.utf8");
    FILE *outputfile;
    outputfile = fopen("C:/sources/履歴書.txt", "w, ccs=UTF-8");
    if (outputfile == NULL) {
    		printf("cannot open\n");
    		exit(1);
    	}
    fprintf(outputfile, "this is test.\n");
    fclose(outputfile);
    How can I create file with exact name "履歴書.txt" in C language?

  2. #2
    Registered User
    Join Date
    May 2012
    Posts
    505
    Quote Originally Posted by Mithlesh10 View Post
    I am trying to create a new file with name "履歴書.txt" using C language's fopen(). I am using Windows 10 English version.

    File gets created but file name is always garbled.

    Code:
    setlocale(LC_ALL, "ja_JP.utf8");
    FILE *outputfile;
    outputfile = fopen("C:/sources/履歴書.txt", "w, ccs=UTF-8");
    if (outputfile == NULL) {
            printf("cannot open\n");
            exit(1);
        }
    fprintf(outputfile, "this is test.\n");
    fclose(outputfile);
    How can I create file with exact name "履歴書.txt" in C language?

    C's fopen takes a char *. This is a sequence of 8-bit bytes designed to hold Latin characters. You might be able to pass it UTF-8, which encodes ASCII as 8 bits and Japanese as multibytes, but it probably won't work.
    So you will have to use a function with a name like _wfopen(), which takes wide characters (16 bit).
    I'm the author of MiniBasic: How to write a script interpreter and Basic Algorithms
    Visit my website for lots of associated C programming resources.
    https://github.com/MalcolmMcLean


  3. #3
    Registered User
    Join Date
    Oct 2020
    Posts
    4
    Quote Originally Posted by Malcolm McLean View Post
    C's fopen takes a char *. This is a sequence of 8-bit bytes designed to hold Latin characters. You might be able to pass it UTF-8, which encodes ASCII as 8 bits and Japanese as multibytes, but it probably won't work.
    So you will have to use a function with a name like _wfopen(), which takes wide characters (16 bit).
    Thanks for your reply.

    I think I should have mentioned it while asking question, but to keep it simple I didn't.

    FYI, _wfopen() also doesn't work. I have already tried like this :

    Code:
    _wfopen(L"C:/sources/履歴書.txt", L"w,ccs=UTF-8");

  4. #4
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Quote Originally Posted by Malcolm McLean View Post
    C's fopen takes a char *. This is a sequence of 8-bit bytes designed to hold Latin characters.
    ISO 9899 accepts limited sets of multibyte charsets like UNICODE, as defined in Annex D of the standard. Including using multibyte charset with identifiers (thou some compilers don't respect this - clang and GCC 10 does! But not GCC 9 or below and maybe not Visual Studio's C++).

    You might be able to pass it UTF-8, which encodes ASCII as 8 bits and Japanese as multibytes, but it probably won't work.
    It works just fine:

    Creating a new file with multibyte japanese string in C-untitled-png

    Mithiesh10 problem is because Windows has 2 charsets: One for single byte chars (Windows 1252?), other for multibyte chars (UTF-16 or WideChar), it doesn't recognize UTF-8. This is limitatio of Windows, not C.

    In that aspect you are right: For Windows you must use wchar_t and w* functions to use UNICODE. But again, it is not a limitation of C.
    Last edited by flp1969; 10-19-2020 at 06:37 AM.

  5. #5
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Quote Originally Posted by Mithlesh10 View Post
    Thanks for your reply.

    I think I should have mentioned it while asking question, but to keep it simple I didn't.

    FYI, _wfopen() also doesn't work. I have already tried like this :

    Code:
    _wfopen(L"C:/sources/履歴書.txt", L"w,ccs=UTF-8");
    Here worked fine. The problem, it seems, is the Windows terminal don't recognize UNICODE:

    Attachment 16219

  6. #6
    Registered User
    Join Date
    Oct 2020
    Posts
    4
    Didn't get your attachment. Please reattach or share code.

  7. #7
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Sorry... it is not a code, but an image of file explorer, showing the name correctly.
    I'am getting ???.txt on terminal, but on file explorer is correct.

  8. #8
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    The test code is simply:
    Code:
    if ( ! ( f = _wfopen( L"履歴書.txt", L"w,ccs=UTF-8" ) ) )
    { perror( "_wfopen" ); exit(1); }
    fclose( f );

  9. #9
    Registered User
    Join Date
    Oct 2020
    Posts
    4
    Thanks for the code. I also have exact same code and it runs fine. But file created has garbled name. To be precise it becomes - å±¥æ..´æ›¸.txt

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Which compiler are you using?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    If you are willing for a somewhat hacky solution, try this:

    Code:
    #include <stdio.h>
    
    
    int main(int argc, char *argv[]) {
       // UTF-8 coding for unicode filename
       char filename[] = {0xe5, 0xb1, 0xa5, 0xe6, 0xad, 0xb4, 0xe6, 0x9b, 0xb8, '.', 't', 'x', 't', 0 };
       FILE *f = fopen(filename,"wb");
       if(f) fclose(f);
    }

  12. #12
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Quote Originally Posted by hamster_nz View Post
    If you are willing for a somewhat hacky solution, try this:
    Unfortunately, this doesn't work on Windows. Here's the same code, using fopen() (test.exe) and _wfopen() (test2.exe). Notice that in file explorer the created flie is shown with the "correct" name when using _wfopen(), but in the terminal is ???.txt. And, when using fopen() the individual bytes of UTF-8 chars ara used as if they were in Windows-1252 codepage (the '.txt' portion isn't there because file explorer hides the extension, by default):

    Creating a new file with multibyte japanese string in C-2-jpg

    PS: On Windows you need encode your source code in UTF16LE if using multibyte charsets.
    Last edited by flp1969; 10-20-2020 at 06:26 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Issue creating a string from file using strcat
    By wormsworm in forum C Programming
    Replies: 8
    Last Post: 02-21-2008, 01:28 PM
  2. File creating with string for name
    By Loknar Gor in forum C++ Programming
    Replies: 13
    Last Post: 09-23-2006, 08:38 AM
  3. W B : Invalid or incomplete multibyte or wide character
    By SoFarAway in forum C Programming
    Replies: 1
    Last Post: 02-19-2005, 12:40 AM
  4. Unicode and Multibyte
    By hollowlife1987 in forum Windows Programming
    Replies: 0
    Last Post: 07-11-2004, 05:34 AM
  5. Japanese
    By IcyDeath in forum A Brief History of Cprogramming.com
    Replies: 27
    Last Post: 12-03-2001, 06:36 PM

Tags for this Thread