Thread: Reading .txt files with whitespaces within data-fields

  1. #1
    Registered User
    Join Date
    Mar 2016
    Posts
    203

    Reading .txt files with whitespaces within data-fields

    I'm using the following code to read .txt files and print to console:
    Code:
    
    #include <iostream>
    #include<fstream>
    #include<sstream>
    #include<string>
    #include<vector>
    #include<tuple>
    
    
    using namespace std;
    
    
    int main (){
    
    
        fstream File;
        string name, address;
        int age;
        vector<tuple<string,string,int>>v;
        File.open("F:\\test.txt");
        if(File.is_open()){
           string line;
           while(getline (File, line)){
             stringstream stream(line);
                     while(stream>>name>>address>>age){
                    v.emplace_back(name, address, age);
                }
            }
        }
          for(auto& itr : v){
            cout<<"name: "<<get<0>(itr)<<", address: "<<get<1>(itr)<<", age: "<<get<2>(itr)<<"\n";
          }
    }
    When my .txt file is:
    Name Address Age
    John London 45
    Jane Cambridge 31
    Output:
    Code:
    name: John, address: London, age: 45
    name: Jane, address: Cambridge, age: 31
    Note that the first line of the .txt file (Name Address Age) is not printed in this case but if I remove the first line and start the .txt file with actual names, addresses, ages then still the first entry is printed fine. Why this dichotomy?


    And when my .txt file is:
    Name Address Age
    John Smith Abbey Road, London 45
    Jane Doe Huntingdon Road, Cambridge 31


    The output is essentially nothing:
    Code:
    Process returned 0 (0x0)   execution time : 0.025 s
    Press any key to continue.

    I understand that having whitespaces within data-fields is breaking down sstringstream but how could I then best write the program so that it can read strings with whitespaces within them into a single string and then move on to the next variable which could be another string with whitespaces within (as in this example) or some other datatype? It seems that the stringstream ctor does not take a delimiter, otherwise I might have tried ...
    Code:
     stringstream stream(line, ';');
    ... with the semi-colon delimiting the datafields in the .txt file.


    Thanks as ever.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Note that the first line of the .txt file (Name Address Age) is not printed in this case but if I remove the first line and start the .txt file with actual names, addresses, ages then still the first entry is printed fine. Why this dichotomy?
    Because the word "age" does NOT parse as an integer, so the first line parsed with while(stream>>name>>address>>age) will fail.

    Without delimiters, how would you parse names like?
    Henry Ford
    Edgar Allan Poe
    Jean Claude Van Damme


    You can use getline() on a string stream, which does allow you to choose a delimiter on each successive call.
    parsing - Parse (split) a string in C++ using string delimiter (standard C++) - Stack Overflow
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Mar 2016
    Posts
    203
    @Salem - thanks for the headsup, this has get me going but I haven't reached the final solution yet. My initial effort has been on the following lines:

    0. read the file a line at a time and for each line generate the tokens, token, with find(delimiter) as described in the link sent
    1. intialize a stringstream object, stream, with token
    2. declare an int counter, t, and a temp tuple, MyTuple
    3. assign stream to MyTuple upon generation with get<t>(MyTuple) = stream;
    3. t++
    4. and when the entire line has been read, v.push_back(MyTuple) where v is the vector<tuple<string,string,int>>
    5. read next line and repeat above ...

    But there's a problem at step 3 because std::get<t>(MyTuple) would only work if t was const. I'm showing the code as it stands below and commented out the line that doesn't work. If anybody can help me out from here on that would be most appreciated. Thanks
    Code:
    #include <iostream>
    #include<fstream>
    #include<sstream>
    #include<string>
    #include<vector>
    #include<tuple>
    using namespace std;
    
    int main (){
    fstream File;
    vector<tuple<string,string,int>>v;
    File.open("F:\\test.txt");
        if(File.is_open()){
           string line;
           string delimiter = ";";
           size_t pos = 0;
           string token;
                while(getline (File, line)){
                tuple<string, string, int> MyTuple{};
                    while((pos = line.find(delimiter))!=string::npos){
                    token = line.substr(0, pos);
                    stringstream stream(token);
                    int t = 0;
                    get<t>(MyTuple) = stream;//DOESN'T WORK!!!!//
                    t++;
                    line.erase(0, pos + delimiter.length());
                    }
                    v.push_back(MyTuple);
                }
            }
    File.close();
    
          for(auto& itr : v){
            cout<<"name: "<<get<0>(itr)<<", address: "<<get<1>(itr)<<", age: "<<get<2>(itr)<<"\n";
          }
    }

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Are you explicitly practising the use of std::tuple? If not, a more traditional way of doing this would be:
    Code:
    struct Entry
    {
        std::string name;
        std::string address;
        int age;
    };
    
    std::istream& operator>>(std::istream& in, Entry& obj)
    {
        static const auto delimiter = ';';
        auto line = std::string();
        if (getline(in, line))
        {
            auto stream = std::stringstream(line);
            getline(stream, obj.name, delimiter) &&
                getline(stream, obj.address, delimiter) &&
                stream >> obj.age;
        }
        return in;
    }
    Then your code in main to read the entries would be simplified to something like:
    Code:
    auto entries = vector<Entry>();
    auto File = fstream("F:\\test.txt");
    if (File)
    {
        auto entry = Entry();
        while (File >> entry)
        {
            entries.push_back(entry);
        }
    }
    In other words, write a function (i.e., the overloaded operator>> for istream) to read and parse a single entry, then write another function (i.e., main, or possibly a helper function called from main) to read and parse the entire file using the previous function.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Registered User
    Join Date
    Mar 2016
    Posts
    203
    @laserlight : this is brilliant, thank you so much.

    Is there a specific term to describe the getline() && getline() ... technique? It seems very handy and I'd like to read up a bit more on this.

    PS: FYI, both the following lines:
    Code:
     auto stream = std::stringstream(line);
    auto File = fstream("F:\\test.txt");
    give error messages that the corresponding ctor's are deleted

  6. #6
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Is there a specific term to describe the getline() && getline() ... technique?
    No, getline() has a delimiter parameter. You can click the link to know more.

    Laserlight is using a newer C++11 feature. You can still declare objects the normal way.

  7. #7
    Registered User
    Join Date
    Nov 2012
    Posts
    1,393
    Quote Originally Posted by sean_cantab View Post
    Is there a specific term to describe the getline() && getline() ... technique? It seems very handy and I'd like to read up a bit more on this.
    You may be looking for the term short-circuit evaluation. In the expression A() && B(), A() is first evaluated and if it is not true, evaluation does not continue. The above example could also be written like this:

    Code:
    if (getline(stream, obj.name, delimiter))
        if (getline(stream, obj.address, delimiter))
            stream >> obj.age;

  8. #8
    Registered User
    Join Date
    Mar 2016
    Posts
    203
    Thanks, I later realized that the short-circuit evaluation also works with my original vector<tuple> approach:

    Code:
    vector<tuple<string,string,int>>v;
    getline(stream, name, delimiter) && // reads stream into name upto delimiter;
                getline(stream, address, delimiter) && // reads stream into address upto delimiter;
                stream >> age;
                    v.emplace_back(name, address, age);

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Reading data from files
    By heinrichxs in forum C++ Programming
    Replies: 3
    Last Post: 02-07-2011, 01:13 PM
  2. Dealing with whitespaces whilst reading a file
    By agentsmith in forum C Programming
    Replies: 1
    Last Post: 04-12-2008, 01:43 PM
  3. reading data fom files
    By shuo in forum C++ Programming
    Replies: 11
    Last Post: 10-22-2007, 12:24 AM
  4. Trouble Reading Data from Files
    By CConfusion in forum C Programming
    Replies: 11
    Last Post: 04-06-2006, 07:12 PM
  5. Reading data from files
    By ChwanRen in forum C Programming
    Replies: 4
    Last Post: 05-06-2003, 07:40 PM

Tags for this Thread