Thread: word count problem

  1. #31
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by MK27
    obviously you would only need a single map, not a vector + map there
    If we really wanted to have a vector of words and a corresponding vector of counts, it would still be more time efficient to make use of a map or unordered_map to do the counting, and then transfer the result to the vectors in a single pass.

    An efficient solution without using a map or unordered_map would be to read in the entire list of words into a vector, sort it, and then count the words in a single pass... but this may not be very space efficient.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  2. #32
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Binary tree!

    Altho that's what std::map is, I believe (an RB tree).

    Quote Originally Posted by laserlight View Post
    If we really wanted to have a vector of words and a corresponding vector of counts, it would still be more time efficient to make use of a map or unordered_map to do the counting,
    I honestly don't think so, take a close look at how the count happens (this is why I said it had a certain elegance).

    Code:
                words_count [place] +=1 ;
                       [...or...]
                words.push_back (x) ;
                words_count.push_back  (1) ;
    Presuming the iteration thru the vector would have to happen anyway (yielding place), this is very minimal.
    Last edited by MK27; 06-18-2010 at 11:01 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #33
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by MK27
    Binary tree!

    Altho that's what std::map is, I believe (an RB tree).
    Yes.

    Quote Originally Posted by MK27
    I honestly don't think so, take a close look at how the count happens (this is why I said it had a certain elegance).
    (...)
    Presuming the iteration thru the vector would have to happen anyway (yielding place), this is very minimal.
    No, it is still O(n^2) time versus O(n log n) time. You must take a close look at the loop, not just highlighting the parts that look nice.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #34
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by laserlight View Post
    No, it is still O(n^2) time
    The part to do with the word vector is, the count vector is O(1) because it uses the same value:

    Code:
            for (teller =0; teller < size; teller ++)
    // this will result in O(n^2)
            {
                if (x == words [teller])
                {
                    found = 1 ;
                    place = teller ;
                }
            }
            if (found == 1)
            {
                found = 0 ;
    // this is O(1) 
                words_count [place] +=1 ;
            }
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #35
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by MK27
    The part to do with the word vector is, the count vector is O(1) because it uses the same value:
    Yes, but that O(1) is dominated by the O(n^2). You cannot increment the count, or set it to 1, without first finding the word, or discovering that it is not in the list. If you use a map, the counting is done in O(n log n) time, then the transfer to the vectors is done in O(n) time, which is dominated by the O(n log n).
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #36
    Registered User
    Join Date
    May 2010
    Posts
    230
    Hello,

    This is way to high for me now but very interresting.

    Roelof

  7. #37
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by laserlight View Post
    Yes, but that O(1) is dominated by the O(n^2). You cannot increment the count, or set it to 1, without first finding the word, or discovering that it is not in the list. If you use a map, the counting is done in O(n log n) time, then the transfer to the vectors is done in O(n) time, which is dominated by the O(n log n).
    For sure, WRT using one map<string,int>. I was talking about this:

    Quote Originally Posted by laserlight View Post
    If we really wanted to have a vector of words and a corresponding vector of counts, it would still be more time efficient to make use of a map or unordered_map to do the counting, and then transfer the result to the vectors in a single pass.
    HMM, I guess I missed the point of that final phrase so there is some miscommunication. I had it in my head you meant to push them onto a vector, using the iterative search, then increment a map. Of course, if you actually put the map in the vector would seem somewhat superfluous, so maybe a silly discussion.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Seg Fault in Compare Function
    By tytelizgal in forum C Programming
    Replies: 1
    Last Post: 10-25-2008, 03:06 PM
  2. Replies: 1
    Last Post: 03-06-2005, 10:12 AM
  3. Wrong Output
    By egomaster69 in forum C Programming
    Replies: 7
    Last Post: 01-28-2005, 06:44 PM
  4. Word COM problem
    By UnclePunker in forum Windows Programming
    Replies: 6
    Last Post: 01-06-2005, 11:51 AM
  5. MS Word Problem
    By golfinguy4 in forum Tech Board
    Replies: 8
    Last Post: 06-22-2003, 01:33 AM