Hello again everyone, and thank you for your time. I was examining prospects on ELance the other day, and I found one that looked like a good chance to practice some C++ with the standard template library...
The request for proposal provided the problem statement rather directly:
I strongly suspect it was homework, because the request for proposal was rather immediately closed. I did write a solution using GNU's C++ compiler though, and I was interested in some one's insight. I used the STL reference at cplusplus.com and found what I needed to finish it up, although I don't have access to the scale of test material the request for proposal stipulated.I have two input files:
* file 1 containing suffixes, description 1 of suffixes, description 2 of suffixes, ranking in float or decimal format (comma delimited)
* file 2 containing words and rankings (char and int, possibility that the rankings would be all the same in which case we should sort by alphabetical order on the word)
I need to produce an output file that containing an output list that shows all words that had a matching suffix in the suffix file, and sorted by the ranking in file 2.
The format of the output file (comma delimited) should be:
word from file 2, suffix from file 1, ranking from file 2, description 1 from file 1, description 2 from file 1, ranking from file 1
This is a simple C++ program, I expect an experienced programmer could write it in under an hour, I simply don't have the time to play with string arrays vs. character arrays, and so on.
Sample file 1:
ed, latin suffix, roman suffix, 12.46
ing, german suffix, greek suffix, 4.45
tion, french suffix, german suffix, 4.45
Sample file 2:
Declared, 3
Bastion, 4
Tiring, 4
Output should look like:
Declared, ed, 3, latin suffix, roman suffix, 12.46
Bastion, tion, 4, french suffix, german suffix, 4.45
Tiring, ing, 4, german suffix, greek suffix, 4.45
The program should not be case sensitive at all. Sort order for the output file should be alphanumeric on field 3, field 6, field 1, field 2, field 4, field 5.
This is a simple program but I haven't played with C++ in a while. The program should come fully commented and take 3 args, file 1 name, file 2 name, file 3 name. Keep in mind that I'm running on windows so I should be able to give a full file name including backslashes (so you'll probably have to escape them). File 1 is around 500 lines, file 2 is around 500,000 lines so you need to make this perform decently. I can't imagine this program taking more than 10 minutes to run.
When I did the ERD between the suffixes and the word occurrences, I noticed that there would be a 1 to many relationship. In an attempt to avoid having multiple copies of the suffixes in a list that will eventually be sorted, I checked the STL for a mapping type structure. I didn't find one that satisfied me, but I'm up for more reading at cplusplus.com...if you have any suggestions? I would really like to pontificate this one with any of you if you have the time or inclination.
So keeping the copies lying around, this is the structure that I settled on for an 'etymology' of a word with it's suffix...
During construction, I read in a next word then do a linear search for it's suffix, and finally make a copy. A little slow, but it lends itself to parallelism until I get a chance to test it's run times:Code:class Etymology { private: Word w; Suffix s; public: Etymology(fstream &in, vector<Suffix> v); bool empty() const { return w.getWord().empty(); } bool operator<(const Etymology &z) const; friend fstream &operator<< (fstream &out, const Etymology &e); };
This is the routine where the etymologies are extracted/organized in the summary of word statistics:Code:Etymology::Etymology(fstream &in, vector<Suffix> v) { bool err; unsigned int i; vector<Suffix>::size_type sz = v.size(); in >> w; if(w.getWord().empty()) { return; } for(i = 0, err = true; i < sz; i++) { if(w == v[i]) { s = v[i]; err = false; break; } } if(err) { cerr << "No matching suffix for: " << w.getWord() << endl; } return; }
Is there some way to do that using the standard template library without leaving a large number of copies? I thought about different reference types, but without something like making the ordering operator static nothing clicked for me.Code:void Summary::matchWords() { fstream fin(words.c_str(), ios::in); if( ! fin ) { cerr << "Unable to open " << words.c_str() << " for reading words." << endl; cerr << "Exiting...." << endl; exit(1); } do { Etymology e(fin, suffixes); if( ! e.empty()) { observation.push_back(e); } } while(fin.good()); fin.close(); suffixes.clear(); observation.sort(); return; }
Thank you again. Please do have a great day.
Best Regards,
New Ink -- Henry