Thread: Reading 2 csv files and combining their content into an output

  1. #16
    Registered User
    Join Date
    Aug 2012
    Posts
    78
    Quote Originally Posted by Elysia View Post
    You didn't (and don't) have to write your own iterator. You can use ones already existing in the standard library. The one you have written just mimics istream_iterator anyway.
    You don't need to manually close files. They close automatically when they go out of scope. Learn to rely on this behaviour. It is a cornerstone in C++ called RAII.
    back_inserter creates a back_inserter iterator which you can use to push back items into a container. For example:

    Code:
        std::ifstream in("myfile.txt");
        std::vector<std::string> Lines;
        std::copy(std::istream_iterator<std::string>(in), std::istream_iterator<std::string>(), std::back_inserter(Lines));
        std::copy(Lines.begin(), Lines.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
    A vector does not automatically contain storage, so you can't just write elements to it directly. So you can use a back inserter, which calls container.push_back(elem) for every element.
    Hey Elysia! Thanks again for the hints (will remove the close commands and the template). However, I am wondering whether it is really necessary to have two copy statements. Why not copy things into the output file right away? Plus, the main issue I am dealing with is still the commas and formatting...

  2. #17
    Registered User
    Join Date
    Aug 2012
    Posts
    78
    So - I have adapted the code a bit: 1) Managed to resolve the issue with the additional newline (by subtracting last character from string) and 2) Added back_inserter functionality.
    Code:
    #include <iostream>
    #include <sstream>
    #include <fstream>
    #include <iterator>
    #include <vector>
    
     
    using namespace std;
    
    
     
    int main( int argc, char * args[] )
    {
    
        ifstream ft( "input1.csv" );                    // declare ft stream
        string ftline;
    
        // 1) Get number of data points
    
        string datapoints, dummy;
    
        getline(ft, datapoints, ',');                     // read very first value: # data points
        double count = atof(datapoints.c_str());
    getline(ft, dummy, '\n');                         // read and drop second value
    
    
     // 2) Start writing into output file
        string outfilename = "output.txt";
        ofstream fout(outfilename.c_str());
    
        
        for (int row = 1; row <= count; row++)
        {
    
            // enumerator
            fout << row << '\t';
    
            
            // format features
            getline(ft, ftline, '\n');
    
            ftline = ftline.substr(0, ftline.size()-1);
            stringstream ss(ftline);
    
            vector<string> Line;
            copy(istream_iterator<string>(ss), istream_iterator<string>(), back_inserter( Line ));
            copy( Line.begin(), Line.end(), ostream_iterator<string>( fout, "\n" ) );
    
        }
    
    
    return 0;
        
     }
    Output file output.txt:
    1 1:4,2:5,3:7,4:4,5:7
    2 1:4,2:3,3:6,4:4,5:6
    3 1:4,2:4,3:7,4:4,5:7
    4 1:2,3:5,4:7,5:4,8:3
    5 2:3,4:4,5:7,,
    6 1:4,3:7,4:6,,
    7 ,,,,
    8 2:4,8:7,,,
    9 3:1,4:4,6:7,,
    10 1:5,2:8,,,
    11 1:1,2:5,,,
    12 3:4,4:5,6:8,
     
    Now, the two remaining issues are still: 1) Do away with those extra commas and 2) Replace colon ':' with exclamation mark '!'.
     

     
    Last edited by in_ship; 05-07-2013 at 04:09 AM.

  3. #18
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    So the general idea here is that if you have a file such as:

    token:token:token\n
    token:token:token\n

    Then you have two options:
    Either you read token by token, then write it out token by token separated with a token separator you choose, or you read one row and replace every token separator in the read row with whatever other token separator you want, then write the entire line.

    Note that getline read up to and including your stop reading at token as specified by the third argument. getline does not, however, put that token into the read data. So if you read a line, it will not store the newline into the read string.
    If you read token by token, then you have to consider that a token might also be separated by a row (like in the above). In that case, it might be good to read a full row, then parse the tokens out of it.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  4. #19
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Remember the key to actually merging the files is to process them both at once.

    Now, the two remaining issues are still: 1) Do away with those extra commas and 2) Replace colon ':' with exclamation mark '!'.
    You could also throw a regular expression at the problem. C++11 added a regex library to the standard. If you're using MSVC++ 2010 you have it. I managed to use this with lots of success:
    Code:
    #include <regex>
    #include <string>
    #include <sstream>
    using namespace std;
    
    string process_line(string line)
    {
        regex tokenRegex("\\d+:\\d+,|\\d+:\\d+$|\\d+:\\d+\.\\d+,|\\d+:\\d+\.\\d+$");
        sregex_iterator tokensBegin(line.begin(), line.end(), tokenRegex);
        sregex_iterator tokensEnd;
    
        ostringstream builder;
        for ( sregex_iterator i = tokensBegin; i != tokensEnd; i++ )
        {
            string match = (*i).str();
    
            string::size_type where = match.find(':');
            match[where] = '!';
            builder << match;
        }
    
    	
        string result = builder.str();
        if (!result.empty() && result.back() == ',')
            result.pop_back();
    
        return result;
    }
    Running it on both files, I managed to produce:
    Code:
    1	1!4,2!5,3!7,4!4,5!7	5!1.0
    2	1!4,2!3,3!6,4!4,5!6	1!1.0,4!1.0
    3	1!4,2!4,3!7,4!4,5!7	2!1.0,3!1.0
    4	1!2,3!5,4!7,5!4,8!3	3!1.0,8!1.0
    5	2!3,4!4,5!7	4!1.0
    6	1!4,3!7,4!6	5!1.0
    7		
    8	2!4,8!7	2!1.0
    9	3!1,4!4,6!7	3!1.0
    10	1!5,2!8	4!1.0,7!1.0
    11	1!1,2!5	3!1.0,4!1.0,5!1.0
    12	3!4,4!5,6!8	1!1.0,5!1.0
    Python also does regex and has for a long time...
    Last edited by whiteflags; 05-07-2013 at 02:38 PM.

  5. #20
    Registered User
    Join Date
    Aug 2012
    Posts
    78
    I think what I was trying to do initially was a total overkill. This piece of code got the work done in less than 10min.

    Code:
    /*
     * FileParser.cpp
     *
     *  Created on: May 5, 2013
     *      Author: in_ship
     */
    
    
    #include <iostream>
    #include <sstream>
    #include <fstream>
    #include <iterator>
    #include <vector>
    #include <algorithm>
    
    using namespace std;
    
    
    
    int main( int argc, char * args[] )
    {
    
        ifstream ft("./input1.csv");        // declare ft stream
        ifstream lbl("./intput2.csv");     // declare lbl stream
    
    
        // 1) Get number of data points
        string datapoints, dummy;
        getline(ft, datapoints, ' ');                     // read very first value: # data points
        double count = atof(datapoints.c_str());
        getline(ft, dummy, '\n');                         // read and drop second value
        getline(lbl, dummy, '\n');                         // read and drop whole line
    
    
        // 2) Start writing into output file
        string outfilename = ".output.txt";
        ofstream fout(outfilename.c_str());
    
        for (int row = 1; row <= count; row++)
        {
    
            // enumerator
            fout << row << '\t';
    
            // format features
            string ftline;
            getline(ft, ftline, '\n');
            ftline = ftline.substr(0, ftline.size()-1);
            replace(ftline.begin(), ftline.end(), ':', '!');
            replace(ftline.begin(), ftline.end(), ' ', ',');
            fout << ftline << '\t';
    
            // format labels
            string lblline;
            getline(lbl, lblline, '\n');
            replace(lblline.begin(), lblline.end(), ':', '!');
            replace(lblline.begin(), lblline.end(), ' ', ',');
            fout << lblline << '\n';
    
        }
    
        cout << "done\n";
        return 0;
    
    
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Combining files together
    By binks in forum C Programming
    Replies: 41
    Last Post: 04-29-2012, 04:16 PM
  2. GUIs and combining files
    By sharrakor in forum C Programming
    Replies: 2
    Last Post: 03-22-2009, 07:00 AM
  3. Combining files
    By TheDan in forum C++ Programming
    Replies: 5
    Last Post: 04-07-2006, 07:18 AM
  4. Replies: 14
    Last Post: 04-06-2006, 12:18 AM
  5. Combining multiple wav files into one
    By eam in forum Tech Board
    Replies: 3
    Last Post: 01-17-2005, 11:08 AM