Thread: Using/ Searching through Multiple Vectors

1. Using/ Searching through Multiple Vectors

Hi,

Say you have a program using multiple vectors, and you want to search through them all for individual items. What would be the best way to declare/ use the vectors? In the following program I have two vectors of strings, nouns and verbs. I then search through them both for a word the user enters.

But say if you had loads of different vectors for things like adjectives, adverbs, prepositions, etc. Would it be better to declare the vectors as part of a multi-dimensioned vector? And would that mean you didn't need to write as much code searching through them all? ie you could just use find once?

Thanks.

Code:
```#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;

const char *v[4] = {"run", "scream", "jump", "procrastinate"};
vector<string> verbs(v, v+4);
const char *n[4] = {"car", "wall", "dog", "basketball"};
vector<string> nouns(n, n+4);

vector<string>::iterator iter;

int main() {
string string1;
cout << "Enter a word: ";
cin >> string1;
if (find (nouns.begin(), nouns.end(), string1) != nouns.end())
cout << string1 << " is a noun.";
else if (find (verbs.begin(), verbs.end(), string1) != verbs.end())
cout << string1 << " is a verb.";
else
}```

2. >> What would be the best way to declare/ use the vectors? In the following program I have two vectors of strings, nouns and verbs. I then search through them both for a word the user enters.

If you're going to use vectors, keep them sorted and use binary_search - it's fast and efficient. Otherwise I'd recommend std::map.

3. Using a vector of vectors sounds like a sound idea. You can then use a loop to search among all word types.

Probably also an enum like this comes in handy.

Code:
`enum WordType {Verb, Noun, Adjective, Preposition, ..., MaxWordType };`

4. Personally, I'd just have separate variables for each vector you need. There is a finite number (how many parts of speech can there be?) and it will help with clarity if they can be named clearly. The enum anon suggested could help with that, but I'd keep it simple and just use variables.

As far as searching the vectors goes, you'll likely have to search all of them individually. I guess if you had a vector or array of vectors you could do the search in a loop or equivalent algorithm, so that is one reason not to take my advice from the first paragraph. But unless you combine all the words into one big data structure then you have to search separately.

Of course, you could keep all the words mixed together in the same data structure, and just include the part of speech as a piece of data (either by making a small class or by storing a pair that holds the word and part of speech). Depending on what your other needs are this might be a good option that is better for searching.

5. Originally Posted by anon
Using a vector of vectors sounds like a sound idea. You can then use a loop to search among all word types.
Would I declare a vector of vectors for the example like this?

Code:
`vector<vector<string>> words_vector;`
If so, how would I go about searching through each of the vectors of strings inside it? Presumably I would declare an iterator like this:

Code:
`vector<vector<string>>::iterator iter;`
And then:

Code:
`for (iter = words_vector.begin(); iter != words_vector.end(); iter++)`
But how do I access the contents of iter inside that for loop? Thanks.

6. Originally Posted by Sebastiani
>> What would be the best way to declare/ use the vectors? In the following program I have two vectors of strings, nouns and verbs. I then search through them both for a word the user enters.

If you're going to use vectors, keep them sorted and use binary_search - it's fast and efficient. Otherwise I'd recommend std::map.
I was thinking about using map- the only problem is, some words have more than one type. So something like 'squash' could be a verb or a noun, if it's referring to the drink. From what I've read the word's name would be the key and every element has to have a unique key.

7. If you want a map where elements don't have unique keys, that's what multimap is for.

>> From what I've read the word's name would be the key
As long as you'll never need to search based on the type, then making the word the key and the type the value makes sense.

8. Say I'm using a multimap. Would something like this make sense:

Code:
```#include <iostream>
#include <string>
#include <map>
using namespace std;

int main()
{
const char* noun = "noun";
const char* verb = "verb";
multimap<string, const char*> m0;
multimap<string, const char*>::iterator it;
m0.insert ( pair<string, const char*>("squash", noun) );
m0.insert ( pair<string, const char*>("squash", verb) );
for ( it=m0.begin() ; it != m0.end(); it++ )
cout << (*it).first << " " << (*it).second << endl;
}```
Also, is const char* the right type to use for the mapped values? I figured because the noun/ verb etc. won't be changing it'd be better to use const with them.

What about string for the key values? Does that make sense or should I use const char* for that as well? After all once someone has decided to put a word into the map it's key value/ what the word is won't change. The only thing is, it seems like it's easier to work with string.

And is it worth having verb/ noun variables declared or would I be better off just putting in "noun", "verb" each time?

Thanks.

9. I would use string for both. It will be clearer and there really isn't much downside. You can make a string const as well if you wanted, although it's not necessary here.

If you're planning on using multiple parts of speech, you can also use an enum.
Code:
`enum part_of_speech { noun, verb };`
Then just use part_of_speech as your type instead of const char*. The only additional code you'd need is to convert the enum to a string. A simple function or array could do that if you went that route.

10. Originally Posted by Daved
I would use string for both. It will be clearer and there really isn't much downside. You can make a string const as well if you wanted, although it's not necessary here.

If you're planning on using multiple parts of speech, you can also use an enum.
Code:
`enum part_of_speech { noun, verb };`
Then just use part_of_speech as your type instead of const char*. The only additional code you'd need is to convert the enum to a string. A simple function or array could do that if you went that route.
I'm experimenting with this to learn more about it. Say you've got a loop that reads words from files into each of the different parts of the map that these enum values refer to. That's what I could have in this program I'm making based around this stuff (it's intended to be part of a chat bot someday). Basically each file contains nouns, verbs, whatever, in the following format: "cat car dog door person wall" and so on. I've posted the relevant part of it up here to show what I mean.

Anyway in the program, I'm currently using this in the aforementioned loop, to transfer words from all the different files into the words map:

Code:
`wordsmap.insert ( pair<string, const char*>(word1, wordtypes[filenum]) );`
But say I change to using an enum. Can I still do something like this? I ask because I tried declaring an enum like this:

Code:
`enum part_of_speech {adjective, adverb, noun, other, preposition, verb, newword};`
And then using:

Code:
`wordsmap.insert ( pair<string, part_of_speech>(word1, part_of_speech[filenum]) );`
But it didn't work. So is there any way I can do that? As I mentioned I've posted up my code to show the context. Also if you see any other mistakes I've made I'd be glad to get some feedback about how I could improve it. Thanks.

Code:
```#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>
#define NUM_OF_FILES 6
using namespace std;

vector<string> wordsdone;
multimap<string, const char*> wordsmap;
multimap<string, const char*>::iterator mmiter;

int main() {
char input_line[500];
string input_string, word1;
int stpos, endpos;
stpos = 0, endpos = -1;
int filenum = 0;
char filename[20];
"other.txt", "prepositions.txt", "verbs.txt"};
strcpy_s(filename, filenames[filenum]);
ifstream file_in(filename);
cout << "Reading from file number " << filenum << ": ";
// Read different types of words from text files into appropriate vectors- nouns.txt into nouns vector etc.
// Also read words from chosen file into words vector.
while (filenum <= NUM_OF_FILES) {
// 1. Get a line of input and put it in input_string.
file_in.getline(input_line, 499);
input_string = input_line;
if (filenum == NUM_OF_FILES) // If it's from the text, make it all lower case.
for (stpos = 0; stpos < input_string.length(); stpos++)
if (input_string[stpos] >= 65 && input_string[stpos] <= 90) // capital letters
input_string[stpos] = tolower(input_string[stpos]);
// 2. Find individual words in input_string. 3. Add them to nouns/verbs/words depending on the file.
for (stpos = 0; stpos < input_string.length(); stpos++) {
// If a letter is found at the start of input_string, stop searching for a start position.
if (input_string[stpos] >= 97 && input_string[stpos] <= 122) {
for (endpos = stpos; endpos <= input_string.length(); endpos++) {
// If a non letter is found at the end of input_string, stop searching for an end position.
if (input_string[endpos] < 97 || input_string[endpos] > 122) {
word1.assign(input_string, stpos, endpos-stpos);
cout << word1 << ",";
if (filenum <= NUM_OF_FILES)
wordsmap.insert ( pair<string, const char*>(word1, wordtypes[filenum]) );
stpos = endpos; // check for a new word after this one, starting from endpos
//  (endpos is currently at the first non-letter after the word.)
// Enter all the different words in worddone once- so the following classification bit
// doesn't ask for the type of the same word twice (although this doesn't really make sense
// because a word can have multiple types).
if (filenum < NUM_OF_FILES &&
find(wordsdone.begin(), wordsdone.end(), word1) == wordsdone.end())
wordsdone.push_back(word1);
break;          // stop it going to the end of input_string in this for loop.
}
}
}
}
if (file_in.eof()) {
filenum++;
file_in.close();
file_in.clear();
if (filenum <= NUM_OF_FILES-1) {
strcpy_s(filename, filenames[filenum]);
file_in.open(filename);
}
else if (filenum == NUM_OF_FILES)
if (!file_in) {
ofstream file_out(filename);
cout << "(creating " << filename << ")";
file_out.close();
file_in.open(filename); // note this doesnt work yet. still dont know why.
}
if (filenum <= NUM_OF_FILES)
cout << endl << "Reading from file number " << filenum << ": ";
}
}
cout << endl << endl << "Finished reading from files." << endl << endl;
file_in.close();
return 0;
}```

11. Code:
`wordsmap.insert ( pair<string, part_of_speech>(word1, part_of_speech[filenum]) );`
An enum is not something indexable. It is a list of constants whose values are automatically generated (unless you specify them).

Your problems might also be: if you have a variable of enum type, you can't do math with it. Use a regular int whose values are limited to the enum values. But now you won't be able to store that int in the map. Use a map<string, int> instead. The enum might only be necessary if otherwise you'd find yourself using magic values in the code: e.g words[adjective][n] is much better than words[2][n].

-------

Also, when inserting into the map, you can use std::make_pair: this way you wouldn't need to type the template arguments:

Code:
`wordsmap.insert ( make_pair(word1, filenum) );`