smart use of std:algorithm

This is a discussion on smart use of std:algorithm within the C++ Programming forums, part of the General Programming Boards category; Simplified: I have one vector <known_names> and a second one <new_names>. Both vectors are sorted and unique. I want to ...

  1. #1
    EOP
    EOP is offline
    Registered User
    Join Date
    Jan 2008
    Posts
    45

    smart use of std:algorithm

    Simplified:

    I have one vector <known_names> and a second one <new_names>.
    Both vectors are sorted and unique.

    I want to find all occurrences of any *new_names where
    1. if *new_names is already present in <known_names>, it should be added to a third vector <already_known_names>
    2. if *new_names is not yet present, it should be added to a fourth vector <newly_added>

    It has to be accomplished without nested loops, as I want to use just std:algorithm in a smart way.

    I didn't find a smart combination of std:algorithms that satisfies my needs up to now.
    Neither any of the remove... nor the for_each functions seem to support my needs.

    Any suggestions would be highly appreciated.

    btw. the lists have a lot of names. At least 100.000+.

  2. #2
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    I do not know what sort of sorting you may want to perform... so check this out.

  3. #3
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    I don't know what's wrong with nested loops. Off the top of my head:
    1. Get an iterator to point to the start of <known_names>.
    2. Run a for_each through <new_names>
    3. While (current new name > current known name) move the iterator forward.
    4. If match, add to already_known; if not, add newly_added.
    You should only end up going through each list once, and I don't see how to do much better than that.

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,636
    This is one possible solution:
    1. Sort <known_names>.
    2. Sort <new_names>.
    3. Use std::set_difference with <known_names> and <new_names> to create <already_known_names>.
    4. Use std::set_difference with <new_names> and <known_names> to create <newly_added>.

    EDIT:
    Oh wait, the input vectors are sorted, so you can ignore #1 and #2.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    And yet again, I am defeated by C++. (Although that does look much like what I suggested, but why not use the built-in version?)

  6. #6
    EOP
    EOP is offline
    Registered User
    Join Date
    Jan 2008
    Posts
    45
    Quote Originally Posted by laserlight View Post
    This is one possible solution:
    1. Sort <known_names>.
    2. Sort <new_names>.
    3. Use std::set_difference with <known_names> and <new_names> to create <already_known_names>.
    4. Use std::set_difference with <new_names> and <known_names> to create <newly_added>.

    EDIT:
    Oh wait, the input vectors are sorted, so you can ignore #1 and #2.
    I hoped to get a reply by you!

    TY and regards from Germany. (I love you secretly for a long time ).

    btw. I didn't find any idea to answer master5001's ingenious reply without being insulting.

    Cheers and TY for the quick (and partially useful) replies.

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,636
    Quote Originally Posted by tabstop
    And yet again, I am defeated by C++. (Although that does look much like what I suggested, but why not use the built-in version?)
    Actually, your suggested algorithm just makes one pass, whereas using set_difference means two passes. So this is the trade-off between a hand crafted algorithm and implementation that should be almost twice as fast versus a generic algorithm whose implementation is well tested and probably well optimised in its own right.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    EOP
    EOP is offline
    Registered User
    Join Date
    Jan 2008
    Posts
    45
    Quote Originally Posted by tabstop View Post
    I don't know what's wrong with nested loops. Off the top of my head:
    1. Get an iterator to point to the start of <known_names>.
    2. Run a for_each through <new_names>
    3. While (current new name > current known name) move the iterator forward.
    4. If match, add to already_known; if not, add newly_added.
    You should only end up going through each list once, and I don't see how to do much better than that.
    Thanks, but I already tested the "loop approach" and it takes up to twice the time (and even more) than std:algorithms.

    Maybe I could speed it up by compiling it to .asm and optimizing the .asm file, but I did this the last time in the late 80s and early 90s when puters were quite slow.

  9. #9
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Erf? Your compiler probably optimizes in ways your granny only used to dream about. I, as many of you already well know, do enjoy writing code in assembler and doing hand optimizations. But I highly recommend against this approach outside of specific times when you are damn sure you are smarter than your compiler or for educational purposes.

    As for the speed issue, theoretically the loops should be faster. So your code was probably designed poorly.

  10. #10
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Just 'cause I was feeling protective of my idea, I compared the two on 200000 random four-character strings (old and new each). Neither one took very long, but mine took about 2/3 as long on this particular machine with this particular compiler, with fake data, consult doctor before use, your mileage may vary, etc.

    One little gotcha that I found with the std::algorithm solution to take note of if you go that route is that the "answer" vectors must be sized ahead of time -- an iterator to the end of the data is returned by set_difference, but the vector is not resized or anything, so you're going to have to store that iterator, or use that to chop off, or whatever.

  11. #11
    Registered User
    Join Date
    Apr 2006
    Posts
    2,023
    Quote Originally Posted by tabstop View Post
    One little gotcha that I found with the std::algorithm solution to take note of if you go that route is that the "answer" vectors must be sized ahead of time -- an iterator to the end of the data is returned by set_difference, but the vector is not resized or anything, so you're going to have to store that iterator, or use that to chop off, or whatever.
    Just use an insertion iterator like back_inserter.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  12. #12
    Registered User
    Join Date
    Jan 2005
    Posts
    7,317
    >> Just use an insertion iterator like back_inserter.
    Although if you're concerned about efficiency a reserve may still be useful.

  13. #13
    EOP
    EOP is offline
    Registered User
    Join Date
    Jan 2008
    Posts
    45
    Quote Originally Posted by master5001 View Post
    Erf? Your compiler probably optimizes in ways your granny only used to dream about... blah
    What about e.g. using loopnz instead of compiler generated loops?

  14. #14
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Can we tell I'm up to about page 20 in Josuttis? Alright, back_inserter works too. Thanks.

    And to make a legitimate contribution:
    I don't think <known_names> - <new_names> is going to give <already_known_names>. We need to do the second part first to get <newly_added>, and then do <new_names> - <newly_added> to get <duplicates>. (Or at least that's how I did it in my test.)

  15. #15
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    What makes you think your compiler is not aware of every opcode for your CPU and knows how to rearrange data to be optimal for loops. To be honest, in assembler writing an optimal loop is probably the hardest thing you can do. A compiler is more apt to do these things than you or me.

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Which smart pointer to use?
    By leeor_net in forum C++ Programming
    Replies: 4
    Last Post: 04-13-2009, 04:29 AM
  2. smart pointer release
    By George2 in forum C++ Programming
    Replies: 24
    Last Post: 02-13-2008, 07:51 AM
  3. can not create Visual C++ Smart Device project
    By George2 in forum Tech Board
    Replies: 0
    Last Post: 11-26-2006, 05:20 AM
  4. Intelligent vs Smart
    By jverkoey in forum A Brief History of Cprogramming.com
    Replies: 17
    Last Post: 07-18-2004, 01:42 PM
  5. Using a smart pointer for all objects
    By phalc in forum C++ Programming
    Replies: 1
    Last Post: 03-28-2004, 08:23 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21