Is it possible to open a file which has international characters in it's name(for example: Turkish characters), using standard C++ library(instead of using a framework's api)?
Is it possible to open a file which has international characters in it's name(for example: Turkish characters), using standard C++ library(instead of using a framework's api)?
Yes, it's possible.
How To Ask Questions The Smart Way
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper.
No. The problem is a bit complicated to explain in detail but you run into all sorts of differences between systems. The standard guarantees that you can open files with wide and narrow strings, but is mum on what the encoding of the file names should be. Thus, making some sort of assumption (like passing in a UTF8 file name) is not going to be portable, if it works at all.
Expanding on what whiteflags said, in cases like these where you want to use non-English characters in a file's name, you need to provide a way to convert from your encoding of the name to the native encoding of the current system. And that's assuming the system converted the name the same way you would do... It's quite a mess, that's why most programs stick with english/latin letters and numbers only( maybe an underscore and/or minus sign, maybe ).
Devoted my life to programming...
Thanks for your answers.
If you're trying to make a portable app then you need to go into the aforementioned details. If you're just trying to open some files on your system then just try it. For example this works for me (on linux) .
Code:#include <iostream> #include <fstream> int main() { std::ifstream f("ģĤĥ"); if (!f) { std::cerr << "error opening file\n"; return 1; } char s[100]; f.getline(s, 100); std::cout << s << '\n'; return 0; }
The problem is most likely going to mount down to whether or not your system supports UTF8 or not in its interfaces taking char* parameters. Linux does. Windows does not.
There are wide versions, e.g. std::wifstream, but they come with their own problems and are tricky to use right.
In short, it's really platform specific and the best way to do it is just to use platform specific APIs because the standard guarantees nothing in this particular topic. You can have different behaviour between compilers and platforms.
Thanks for your answers.
algorism, yes you are right, opening a file (which has international characters in it's name) using standard c++ functions worked well like you said(at least on my system). I was thinking that it wasn't working cause I had tried to allow the user to select a file using C++ Builder's api, then after selection I tried to open that file using standard c++ functions and it was unable to open the file if it had international characters in it.So I thought that standard c++ functions couldn't open that type of files.Though still it has another problem, standard c++ functions seem not able to read the international chars inside files(at least when opened in text mode).
Like people suggested, it seems the best way is to use platform specific api functions, for portability.
Last edited by Awareness; 04-08-2017 at 02:38 PM.
They may or may not. It depends on your character set. If it's UTF8 or ANSI, there shouldn't be any problems. But if you're using UTF16 or something else, you may get problems with \r and \n characters. If you just use UTF8 everywhere in your files and internally, you will bypass this problem. But it doesn't solve international characters in the filename itself.
Thanks Elysia. How can I use UTF-8 in C++?
Best way is to keep all your files in UTF8. Keep all your strings internally as UTF16. This is probably a bit tricky since there is no native way to do this. This depends on your compiler, if it can encode your source file in UTF8 in such a way that international characters are kept intact. Another way is to use some other encoding that your compiler supports (e.g. UTF16) such that it correctly saves international characters and convert them to UTF8 at runtime.
When reading or writing, use narrow streams. Do not use wide streams (e.g. std::wifstream). When interfacing with platform API on windows, convert to UTF16. Linux works natively with UTF8. For other platforms, you need to check if they accept Unicode, and how. Use strings, but not wstring. All string algorithms work natively with UTF8. Opening files with international characters will be problematic unless you use platform API. Avoid using files with international chars.
This should mostly work. You might find some edge cases, though. But that's life since C++ has such bad Unicode support.
They work, as long as you don't mind ending up in the middle of a code point. It's a little more than an edge case.Originally Posted by Elysia
Well, depends on the algorithm, I guess. I can see sort() messing up the string. I can see a reverse algorithm also messing up the string. Good point. Didn't think of that. I've rarely run algorithms on UTF8 strings.
If you need to use algorithms on use UTF8 strings, I would recommend you get a UTF8 library for C++ on the web. There are some intuitive ones out there that provide a u8string similar to std::string.
Thanks for your answers.