word count problem

This is a discussion on word count problem within the C++ Programming forums, part of the General Programming Boards category; Originally Posted by MK27 obviously you would only need a single map, not a vector + map there If we ...

  1. #31
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,794
    Quote Originally Posted by MK27
    obviously you would only need a single map, not a vector + map there
    If we really wanted to have a vector of words and a corresponding vector of counts, it would still be more time efficient to make use of a map or unordered_map to do the counting, and then transfer the result to the vectors in a single pass.

    An efficient solution without using a map or unordered_map would be to read in the entire list of words into a vector, sort it, and then count the words in a single pass... but this may not be very space efficient.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  2. #32
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Binary tree!

    Altho that's what std::map is, I believe (an RB tree).

    Quote Originally Posted by laserlight View Post
    If we really wanted to have a vector of words and a corresponding vector of counts, it would still be more time efficient to make use of a map or unordered_map to do the counting,
    I honestly don't think so, take a close look at how the count happens (this is why I said it had a certain elegance).

    Code:
      words_count [place] +=1 ;
                       [...or...]
                words.push_back (x) ;
                words_count.push_back  (1) ;
    Presuming the iteration thru the vector would have to happen anyway (yielding place), this is very minimal.
    Last edited by MK27; 06-18-2010 at 11:01 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #33
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,794
    Quote Originally Posted by MK27
    Binary tree!

    Altho that's what std::map is, I believe (an RB tree).
    Yes.

    Quote Originally Posted by MK27
    I honestly don't think so, take a close look at how the count happens (this is why I said it had a certain elegance).
    (...)
    Presuming the iteration thru the vector would have to happen anyway (yielding place), this is very minimal.
    No, it is still O(n^2) time versus O(n log n) time. You must take a close look at the loop, not just highlighting the parts that look nice.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #34
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by laserlight View Post
    No, it is still O(n^2) time
    The part to do with the word vector is, the count vector is O(1) because it uses the same value:

    Code:
            for (teller =0; teller < size; teller ++)
    // this will result in O(n^2)
            {
                if (x == words [teller])
                {
                    found = 1 ;
                    place = teller ;
                }
            }
            if (found == 1)
            {
                found = 0 ;
    // this is O(1) 
                words_count [place] +=1 ;
            }
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #35
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,794
    Quote Originally Posted by MK27
    The part to do with the word vector is, the count vector is O(1) because it uses the same value:
    Yes, but that O(1) is dominated by the O(n^2). You cannot increment the count, or set it to 1, without first finding the word, or discovering that it is not in the list. If you use a map, the counting is done in O(n log n) time, then the transfer to the vectors is done in O(n) time, which is dominated by the O(n log n).
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #36
    Registered User
    Join Date
    May 2010
    Posts
    230
    Hello,

    This is way to high for me now but very interresting.

    Roelof

  7. #37
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by laserlight View Post
    Yes, but that O(1) is dominated by the O(n^2). You cannot increment the count, or set it to 1, without first finding the word, or discovering that it is not in the list. If you use a map, the counting is done in O(n log n) time, then the transfer to the vectors is done in O(n) time, which is dominated by the O(n log n).
    For sure, WRT using one map<string,int>. I was talking about this:

    Quote Originally Posted by laserlight View Post
    If we really wanted to have a vector of words and a corresponding vector of counts, it would still be more time efficient to make use of a map or unordered_map to do the counting, and then transfer the result to the vectors in a single pass.
    HMM, I guess I missed the point of that final phrase so there is some miscommunication. I had it in my head you meant to push them onto a vector, using the iterative search, then increment a map. Of course, if you actually put the map in the vector would seem somewhat superfluous, so maybe a silly discussion.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Page 3 of 3 FirstFirst 123
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Seg Fault in Compare Function
    By tytelizgal in forum C Programming
    Replies: 1
    Last Post: 10-25-2008, 03:06 PM
  2. Replies: 1
    Last Post: 03-06-2005, 09:12 AM
  3. Wrong Output
    By egomaster69 in forum C Programming
    Replies: 7
    Last Post: 01-28-2005, 05:44 PM
  4. Word COM problem
    By UnclePunker in forum Windows Programming
    Replies: 6
    Last Post: 01-06-2005, 10:51 AM
  5. MS Word Problem
    By golfinguy4 in forum Tech Board
    Replies: 8
    Last Post: 06-22-2003, 01:33 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21