# word count problem

This is a discussion on word count problem within the C++ Programming forums, part of the General Programming Boards category; Originally Posted by MK27 obviously you would only need a single map, not a vector + map there If we ...

1. Originally Posted by MK27
obviously you would only need a single map, not a vector + map there
If we really wanted to have a vector of words and a corresponding vector of counts, it would still be more time efficient to make use of a map or unordered_map to do the counting, and then transfer the result to the vectors in a single pass.

An efficient solution without using a map or unordered_map would be to read in the entire list of words into a vector, sort it, and then count the words in a single pass... but this may not be very space efficient.

2. Binary tree!

Altho that's what std::map is, I believe (an RB tree).

Originally Posted by laserlight
If we really wanted to have a vector of words and a corresponding vector of counts, it would still be more time efficient to make use of a map or unordered_map to do the counting,
I honestly don't think so, take a close look at how the count happens (this is why I said it had a certain elegance).

Code:
```  words_count [place] +=1 ;
[...or...]
words.push_back (x) ;
words_count.push_back  (1) ;```
Presuming the iteration thru the vector would have to happen anyway (yielding place), this is very minimal.

3. Originally Posted by MK27
Binary tree!

Altho that's what std::map is, I believe (an RB tree).
Yes.

Originally Posted by MK27
I honestly don't think so, take a close look at how the count happens (this is why I said it had a certain elegance).
(...)
Presuming the iteration thru the vector would have to happen anyway (yielding place), this is very minimal.
No, it is still O(n^2) time versus O(n log n) time. You must take a close look at the loop, not just highlighting the parts that look nice.

4. Originally Posted by laserlight
No, it is still O(n^2) time
The part to do with the word vector is, the count vector is O(1) because it uses the same value:

Code:
```        for (teller =0; teller < size; teller ++)
// this will result in O(n^2)
{
if (x == words [teller])
{
found = 1 ;
place = teller ;
}
}
if (found == 1)
{
found = 0 ;
// this is O(1)
words_count [place] +=1 ;
}```

5. Originally Posted by MK27
The part to do with the word vector is, the count vector is O(1) because it uses the same value:
Yes, but that O(1) is dominated by the O(n^2). You cannot increment the count, or set it to 1, without first finding the word, or discovering that it is not in the list. If you use a map, the counting is done in O(n log n) time, then the transfer to the vectors is done in O(n) time, which is dominated by the O(n log n).

6. Hello,

This is way to high for me now but very interresting.

Roelof

7. Originally Posted by laserlight
Yes, but that O(1) is dominated by the O(n^2). You cannot increment the count, or set it to 1, without first finding the word, or discovering that it is not in the list. If you use a map, the counting is done in O(n log n) time, then the transfer to the vectors is done in O(n) time, which is dominated by the O(n log n).
For sure, WRT using one map<string,int>. I was talking about this:

Originally Posted by laserlight
If we really wanted to have a vector of words and a corresponding vector of counts, it would still be more time efficient to make use of a map or unordered_map to do the counting, and then transfer the result to the vectors in a single pass.
HMM, I guess I missed the point of that final phrase so there is some miscommunication. I had it in my head you meant to push them onto a vector, using the iterative search, then increment a map. Of course, if you actually put the map in the vector would seem somewhat superfluous, so maybe a silly discussion.

Page 3 of 3 First 123
Popular pages Recent additions