Thread: Fastest way of examining vector

1. Fastest way of examining vector

Hi

I am reading a text file containing values like below (file may be 16.000 to 100.000 lines)

Value1-Value2
A-B
A-C
A-D
K-M
K-F
B-K
B-A

Each line is read and parsed with a parser. The values are pushed into a vector as "A-B" pairs. But at the same time, I also need to count the occurrence of any value in first column

To do so;
1 ) I can create two vectors, one to hold "A-B" pairs and the other to hold a STL Map container (key to be "A" and value is "3" for example). But, each time I read a new line, I have to trace the second vector, find relevant key and increase the related value by 1 and update the map vector. this is does not seem so efficient

2 ) I can read file twice. At first round, I can create the map vector (that holds the name of element and number of its occurrence). In second tour, I create the "A-B" pairs. But reading file twice also seems a little bit "ad performance solution"

3 ) I can use some sql engine, like Sqlite. Like reading file once and fill my sql table, and do the rest of the job with SQL. I really do not have an idea about the performance of this solution.

Any other options are welcome.

2. If you need the data in the original order as stored in the file, then storing pairs in a vector is fine. If you need to count the occurrence of data in the first column, then storing a map to map the data values to their occurrences is fine. Since you need both, populate both, in the same loop with which you read from file.

Incidentally, unless you also need the occurence map to be sorted by the data, consider if std::unordered_map would be suitable.

3. So, you mean
Code:
```while(not_end_of_file) {
assign_vector();
update_map_vector();
}

update_map_vector(){
while(search_map())
if(related_key_found)
update_value();
}```
But my concern is, doesn't it take too much time each time I read a line to perform update_map_vector(). The values in the file may not be in a sorted order.

4. Originally Posted by fnoyan
But my concern is, doesn't it take too much time each time I read a line to perform update_map_vector().
Finding the key in a std::map takes O(log n) time in the worst case; finding the key in a std::unordered_map takes constant time in the average case. With a mere 100000 entries max to search for a short string, I daresay even linear search would be acceptable (EDIT: okay, maybe not linear search since this will be called in a loop.)

That said, note that your logic is slightly off: you need to insert into the map if the key is not found. Using the overloaded operator[] makes this simple, though very slightly inefficient.