Thread: Reading,writing international characters, searching string/character inside

  1. #1
    Registered User
    Join Date
    Feb 2012
    Posts
    29

    Reading,writing international characters, searching string/character inside

    I am using old C++ previous to C++11 standards.
    I have been trying to read from a file(which includes some Turkish characters too) into a wstring,string or vector, then write the contents into another file, then read the new file's contents into a wstring,string or vector and write them into a file,then read the new file's contents...


    And also I want to be able to use functions like "find()"of wstring/string/vector.


    I tried so many things in the last about 10 days, playing with string,wstring,vectors but I wasn't able to do what I want.I had thought I was able to do that yesterday using vectors and copy() function(I actually read it online and did some additions to suit what I want to accomplish) but unfortunately it didn't copy whitespace characters so it was not useful for my purpose.


    I have tried so many things so I am not sure which one I should write here, but one of them was:




    ************************


    Code:
    #include <iostream>
    #include <vector>
    #include <fstream>
    #include <iterator>
    #include <algorithm>
    
    
    using namespace std;
    
    
    int main()
    {
       
    
    
        vector<char>cv ;
    
    
       ifstream if1 ("hello.txt",ios::binary);
    
    
       if (!(if1))
       {
           cout << "hello.txt can't be opened!";
           return 0;
       }
    
    
    
    
       while (!(if1.eof()))
       {
    
    
        cv.push_back(if1.get());
       }
        cv.push_back('\0');
    
    
    
    
       ofstream of("file1.txt",ios::binary);
    
    
       if (!(of))
       {
           cout << "file1.txt can't be opened!";
           return 0;
       }
    
    
       cout << "\n\ncv: " << cv.data() << endl;
    
    
       int counter=0,i=0;
       vector<char>::iterator oic = cv.begin();
       while (oic!=cv.end()-2 && counter<39)
       {
           oic++;
           counter++;
    
    
           of.put(cv[i]);
           i++;
       }
    
    
     
       ptrdiff_t position = find(cv.begin(),cv.end(),'c') - cv.begin();
       cout << "\n\nposition of h: " << position << endl;
       if1.close();
       of.close();
      
       vector<char>cy ;
    
    
       
       ifstream if2 ("file1.txt",ios::binary);
    
    
       if (!(if2))
       {
           cout << "file1.txt can't be opened!";
           return 0;
       }
    
    
    
    
    
    
       while (!(if2.eof()))
       {
    
    
        cy.push_back(if2.get());
       }
    
    
       cy.push_back('\0');
     
    
    
       ofstream of2("file2.txt",ios::binary);
    
    
       if (!(of2))
       {
           cout << "file2.txt can't be opened!";
           return 0;
       }
    
    
        
        cout << "cy: " << cy.data();
        counter=0;
        i=0;
    
    
       vector<char>::iterator oic2 = cy.begin();
       while (oic2!=cy.end()-2 && counter<39)
       {
           oic2++;
           counter++;
    
    
           of2.put(cy[i]);
           i++;
       }
    
    
    
    
        if2.close();
        of2.close();
    
    
        ptrdiff_t pos = find(cy.begin(),cy.end(),'c')- cy.begin();
        cout << "\nposition of h " << pos;
        return 0;
    }

    *****************


    Although this code seems like reads/writes from/to files well, also when I try to find a character like 'c' ,'o' in the vector it finds it's position but unfortunately when I try to find a Turkish letter like 'ç' instead of 'c' using:


    Code:
    ptrdiff_t position = find(cv.begin(),cv.end(),'ç') - cv.begin();
    it fails to find the letter in the vector.


    How can I solve this problem?
    Last edited by Awareness; 09-23-2019 at 07:11 PM.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    First of all, your indentation needs work.
    Code:
    #include <iostream>
    #include <vector>
    #include <fstream>
    #include <iterator>
    #include <algorithm>
    using namespace std;
    
    int main()
    {
      vector < wchar_t >cv;
    
      wifstream if1("hello.txt", ios::binary);
      if (!(if1)) {
        cout << "hello.txt can't be opened!";
        return 0;
      }
    
      while (!(if1.eof())) {
        cv.push_back(if1.get());
      }
      cv.push_back('\0');
    
    #if 0
      ofstream of("file1.txt", ios::binary);
      if (!(of)) {
        cout << "file1.txt can't be opened!";
        return 0;
      }
    
      cout << "\n\ncv: " << cv.data() << endl;
      int counter = 0, i = 0;
      vector < char >::iterator oic = cv.begin();
      while (oic != cv.end() - 2 && counter < 39) {
        oic++;
        counter++;
        of.put(cv[i]);
        i++;
      }
    #endif
    
      ptrdiff_t position = find(cv.begin(), cv.end(), 'c') - cv.begin();
      cout << "\n\nposition of c: " << position << endl;
      ptrdiff_t position2 = find(cv.begin(), cv.end(), L'ç') - cv.begin();
      wcout << L"\n\nposition of ç: " << position2 << endl;
      if1.close();
    
    #if 0
      of.close();
      vector < char >cy;
      ifstream if2("file1.txt", ios::binary);
      if (!(if2)) {
        cout << "file1.txt can't be opened!";
        return 0;
      }
    
      while (!(if2.eof())) {
        cy.push_back(if2.get());
      }
      cy.push_back('\0');
    
      ofstream of2("file2.txt", ios::binary);
      if (!(of2)) {
        cout << "file2.txt can't be opened!";
        return 0;
      }
    
      cout << "cy: " << cy.data();
      counter = 0;
      i = 0;
      vector < char >::iterator oic2 = cy.begin();
      while (oic2 != cy.end() - 2 && counter < 39) {
        oic2++;
        counter++;
        of2.put(cy[i]);
        i++;
      }
    
      if2.close();
      of2.close();
    
      ptrdiff_t pos = find(cy.begin(), cy.end(), 'c') - cy.begin();
      cout << "\nposition of h " << pos;
    #endif
      return 0;
    }
    Second, you might need to think about how your text file is encoded (UTF8, UTF16, UCS2 ...)
    Eg.
    Code:
    $ hd hello-16le.txt 
    00000000  54 00 68 00 69 00 73 00  20 00 69 00 73 00 20 00  |T.h.i.s. .i.s. .|
    00000010  63 00 0a 00 54 00 68 00  61 00 74 00 20 00 69 00  |c...T.h.a.t. .i.|
    00000020  73 00 20 00 e7 00 0a 00                           |s. .....|
    00000028
    $ hd hello.txt 
    00000000  54 68 69 73 20 69 73 20  63 0a 54 68 61 74 20 69  |This is c.That i|
    00000010  73 20 c3 a7 0a                                    |s ...|
    00000015
    The short answer is to use wide characters and wide streams - as I've done in your code.
    Code:
    $ g++ foo.cpp
    $ hd hello.txt 
    00000000  54 68 69 73 20 69 73 20  63 0a 54 68 61 74 20 69  |This is c.That i|
    00000010  73 20 c3 a7 0a                                    |s ...|
    00000015
    $ ./a.out 
    
    
    position of c: 8
    
    
    position of �: 20
    $ cat hello.txt 
    This is c
    That is ç
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Feb 2012
    Posts
    29
    Well, I think using boost library and imbue solved the problem.

    I used it as this:

    std::locale loc= boost::locale::generator().generate("tr_TR.UTF-8");
    wofstream filestream1;
    filestream1.imbue();

    And now using wifstream and wofstream works nicely, and using "find()" function of wstring now works as it should, it finds at the letters at the right position.I tested without using 'imbue',wofstream doesn't work right without it, also when I use wifstream using wstring and without imbue() , although the wstring object's "find()" function is able to find the English letters, it reports them at wrong positions.

    And also boost and imbue solved my major problem, wstring objects now are able to find Turkish characters and at the right position.I haven't tested the binary mode yet.

  4. #4
    Registered User
    Join Date
    Feb 2012
    Posts
    29
    Sorry, I posted without noticing your answer, thanks for your answer Salem.Without using boost and imbue, wofstream and wifstream were not working right, as I explained in my previous post.Now it works without problems, though I wonder if it would be possible to use imbue without boost and in old C++.(I wasn't able to make 'imbue()' function work before using boost, in the past)
    Last edited by Awareness; 09-24-2019 at 12:36 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Searching for a series of possible substrings inside a string
    By andrew.bolster in forum C Programming
    Replies: 7
    Last Post: 02-10-2008, 02:20 AM
  2. Placing special characters inside a string
    By forum member in forum C Programming
    Replies: 3
    Last Post: 12-07-2007, 08:43 AM
  3. Reading/writing characters
    By rehan in forum C++ Programming
    Replies: 1
    Last Post: 09-03-2007, 12:03 AM
  4. searching a string for mulitple characters
    By cxs00u in forum C++ Programming
    Replies: 1
    Last Post: 04-27-2002, 03:46 PM
  5. International (scandinavian) characters?
    By Unregistered in forum C++ Programming
    Replies: 3
    Last Post: 01-11-2002, 06:14 PM

Tags for this Thread