Thread: Inserting into vector in map of string and vector<string>

  1. #31
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by laserlight View Post
    For example:
    Code:
    pair<it_t, bool> pr = a_map.insert(make_pair("A String", vector_str_t()));
    pr.first->second.push_back("#1");
    pr.first->second.push_back("#2");
    pr.first->second.push_back("#3");
    You can actually see it used in whiteflags' canonical example of how operator[] would be implemented with insert.
    I didn't think of passing the vector<string> constructor.
    I guess you're right.
    http://codepad.org/7IEbweSe
    Indeed, but as there was no check for a failed (i.e., attempted duplicate) insertion, it was not very useful anyway, heh.
    Like I just mentioned in the previous post, checking the value of the bool is not very useful with map<string, vector_str_t>s, since the mapped value is a vector<string>.
    Last edited by Programmer_P; 01-06-2011 at 10:02 AM.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  2. #32
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Programmer_P
    Like I just mentioned in the previous post, checking the value of the bool is not very useful with map<string, vector_str_t>s, since the mapped value is a vector<string>.
    As you state it right now, your reason is rubbish because whether the attempted insertion is a duplicate depends on the key, not the mapped value, so the fact that the mapped value is a vector<string> has no bearing on whether checking the bool is useful.

    For example, let's suppose that you take my advice and write something like this:
    Code:
    {
        vector_str_t& batch = a_map["A String"];
        batch.push_back("#1");
        batch.push_back("#2");
        batch.push_back("#3");
    }
    {
        vector_str_t& batch = a_map["Another String"];
        batch.push_back("#4");
        batch.push_back("#5");
        batch.push_back("#6");
    }
    // A dozen other batches
    // ...
    {
        vector_str_t& batch = a_map["A String"];
        batch.push_back("#43");
        batch.push_back("#44");
        batch.push_back("#45");
    }
    Now, what will happen is that the key "A String" will be mapped to a vector with 6 strings. Logically, you should have grouped all these push_back calls into one batch, but it turns out that this is say, a copy/paste error and the second "A String" batch should actually have a different key. If you had used insert instead of operator[] and then checked the bool, you would have detected this mistake immediately, but as-is the mistake would only be detected later, if it is detected at all.

    Of course, it may well be that this is a feature, not a bug, because you really did intend two batches to have the same key, but that may also make maintenance harder.

    Incidentally, reading this thread more carefully, I observe that iMalc suggested in post #17 what I suggested in post #26.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #33
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Like I just mentioned in the previous post, checking the value of the bool is not very useful with map<string, vector_str_t>s, since the mapped value is a vector<string>.
    A good general rule of thumb is if someone thought it important enough to make a method return a value then it is important enough for you to do something with the value.

    This issue does not come down to speed, style, etc. but comes down to being able to gracefully recover or perhaps notify a client of an errant condition. Checking return values is extremely important in production code and it definitely cannot hurt in your own projects. In the end it is a very good habit to develop and someday it might just prevent you from introducing what might be a very hard to find bug which ends up costing time and money.

    Also note the design of insert. It first checks to see if the specified key is unique and if it is it will go ahead and perform the insertion else it will not. In order for an end user to do this using other portions of the std::map interface would require a find() and then an insert(). However note that b/c insert essentially does a find() first that if you were to find() and insert() you are performing the search for the key twice which is not efficient. The returned iterator can also be useful if you then want to have another container that references directly into the map so you can prevent the key lookup even though the lookup is very fast. There are times when the lookup is too costly to perform and this is when the returned iterator from insert is quite useful b/c you can store this iterator elsewhere and essentially jump right into the map in constant time and grab your object or data without ever performing the lookup.

    But regardless of specific implementation details of containers, how you could use them, etc., checking return values when they are available is never a bad practice. I'm not being pedantic here either because I had to learn this lesson the hard way. Failure to check return values is a big cause of bugs and yet they are the easiest bugs to avoid.
    Last edited by VirtualAce; 01-06-2011 at 05:42 PM.

  4. #34
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by laserlight View Post
    As you state it right now, your reason is rubbish because whether the attempted insertion is a duplicate depends on the key, not the mapped value, so the fact that the mapped value is a vector<string> has no bearing on whether checking the bool is useful.

    For example, let's suppose that you take my advice and write something like this:
    Code:
    {
        vector_str_t& batch = a_map["A String"];
        batch.push_back("#1");
        batch.push_back("#2");
        batch.push_back("#3");
    }
    {
        vector_str_t& batch = a_map["Another String"];
        batch.push_back("#4");
        batch.push_back("#5");
        batch.push_back("#6");
    }
    // A dozen other batches
    // ...
    {
        vector_str_t& batch = a_map["A String"];
        batch.push_back("#43");
        batch.push_back("#44");
        batch.push_back("#45");
    }
    Now, what will happen is that the key "A String" will be mapped to a vector with 6 strings. Logically, you should have grouped all these push_back calls into one batch, but it turns out that this is say, a copy/paste error and the second "A String" batch should actually have a different key. If you had used insert instead of operator[] and then checked the bool, you would have detected this mistake immediately, but as-is the mistake would only be detected later, if it is detected at all.

    Of course, it may well be that this is a feature, not a bug, because you really did intend two batches to have the same key, but that may also make maintenance harder.

    Incidentally, reading this thread more carefully, I observe that iMalc suggested in post #17 what I suggested in post #26.
    You're right about the bolded part, but that's not what I was talking about at all. But its not worth arguing about.
    My real point was, the approach using map[key].push_back(string) virtually tops all versions of using insert(), since there is no risk of accidentally trying to insert a duplicate key with this approach and hence no reason to use insert() at all, if you're saying its for the bool value which indicates if the insert operation was successful or not. The only advantage of insert() over operator[] with a map<string, vector_str_t> is the possible performance increase if you use the reference approach with insert(). And of course, the only advantage of using insert() over the [] operator with the reference approach, is the use case you just described (where you might accidentally after a copy/paste operation, insert strings into the wrong key's vector<string> mapped value object). Otherwise, according to words spoke forth earlier in this thread, insert() will have operated with the same rate of time as the [] operator.
    Last edited by Programmer_P; 01-06-2011 at 10:43 PM.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  5. #35
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Programmer_P
    The bool value in the pair return by insert is for cases in where a new element (both key and mapped value) is attempted to be inserted/created in map, which ends up being true if the element was inserted (which happens if key didn't already exist in map), or else false if element was not inserted.
    That is correct.

    Quote Originally Posted by Programmer_P
    All you did was used two different references for the same vector<string> mapped value object, to push_back strings to it. Not what I was talking about at all.
    Ah, but it is what I am talking about. I am saying that if this was a mistake, using operator[] would leave it undetected. Of course, you could make mistakes in other ways, e.g., providing an entirely incorrect key; nonetheless an accidentally repeated key is one potential problem that using insert with a check would detect.

    Quote Originally Posted by Programmer_P
    My initial point was, this approach virtually tops all versions of using insert(), since there is no risk of accidentally trying to insert a duplicate key with this approach
    Are you stating this as a guarantee or as an observation? As a guarantee means that by some other way, you know for sure that there is no such risk. As an observation means that you believe that there is no such risk because the mechanics of the operation makes it so. However, that would be false, because std::map guarantees that there will be no duplicate keys, but there is always a risk that an attempt will be made to insert with a duplicate key. operator[] merely ignores the attempted duplicate and returns a reference to the existing mapped object.

    The reason why I am willing to recommend the use of operator[] with a reference to the mapped vector is that it seems that you are willing to accept this risk of having multiple batches of push_back calls to the same vector, even though they should be to different vectors.

    Quote Originally Posted by Programmer_P
    The only advantage at all of insert() over operator[] with a map<string, vector_str_t> is the possible performance increase if you use the reference approach with insert().
    That is false. There is no real performance advantage in using insert over operator[] with the reference approach. If you don't specially keep a reference to the inserted object, then insert has the advantage because the return value has an iterator that points to the inserted object.
    Last edited by laserlight; 01-06-2011 at 10:44 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #36
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by laserlight View Post
    That is correct.
    *bows mockingly* Thank you.
    Ah, but it is what I am talking about. I am saying that if this was a mistake, using operator[] would leave it undetected. Of course, you could make mistakes in other ways, e.g., providing an entirely incorrect key; nonetheless an accidentally repeated key is one potential problem that using insert with a check would detect.
    I understand that (except for the entirely incorrect key part...im not sure how that applies).
    Are you stating this as a guarantee or as an observation? As a guarantee means that by some other way, you know for sure that there is no such risk. As an observation means that you believe that there is no such risk because the mechanics of the operation makes it so. However, that would be false, because std::map guarantees that there will be no duplicate keys, but there is always a risk that an attempt will be made to insert with a duplicate key. operator[] merely ignores the attempted duplicate and returns a reference to the existing mapped object.
    As an observation. I don't know enough about the language to make guarantees.
    However, I think my observation is correct: using map[key].push_back() operations only with a map<string, vector_str_t> will return a reference to the existing mapped object if key already exists, hence why there is no risk of replacing an existing mapped value. You're just adding strings to the mapped value. That's why I said with vector<string> types as the mapped value, there is no real reason to check if the insertion operation was correct, if using the syntax I already mentioned, for each time a new string key is inserted with the [] operator, a new vector<string> object is created, and then you immediately push_back strings to that vector. And so the next time you use that syntax with the same key, you just end up adding strings to that same vector.
    The reason why I am willing to recommend the use of operator[] with a reference to the mapped vector is that it seems that you are willing to accept this risk of having multiple batches of push_back calls to the same vector, even though they should be to different vectors.
    Yes, I have considered it, and I have no real need for insert() with the particular class's maps I'll be writing, and doing stuff with, since there will only be a few string keys (associated with single vector<string> mapped values with many strings), and I'm confident that I wont end up accidentally overwriting an existing element.
    So I think your last code example's approach is what I'll be using (only for the slight performance benefits spoken about).
    That is false. There is no real performance advantage in using insert over operator[] with the reference approach.
    I wasn't saying that there was any performance advantage of using insert over operator[] with the reference approach, per se. I was saying that there is a performance advantage (which is the same for both insert and the [] operator) using the reference approach over using map[key].push_back(string) multiple times, and that statement's based off of an earlier poster's comment (so blame him ).
    If you don't specially keep a reference to the inserted object, then insert has the advantage because the return value has an iterator that points to the inserted object.
    And yet the return value is a pair<iterator, bool> struct object which requires creation of a new pair object in memory, and then use of its members, so I was thinking that maybe that might increase the time there over using map[key].push_back(string) since the [] operator gives you direct access (by a reference) to the mapped value. But I could be wrong about that (and probably am).
    Last edited by Programmer_P; 01-06-2011 at 11:26 PM.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  7. #37
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by Bubba View Post
    A good general rule of thumb is if someone thought it important enough to make a method return a value then it is important enough for you to do something with the value.

    This issue does not come down to speed, style, etc. but comes down to being able to gracefully recover or perhaps notify a client of an errant condition. Checking return values is extremely important in production code and it definitely cannot hurt in your own projects. In the end it is a very good habit to develop and someday it might just prevent you from introducing what might be a very hard to find bug which ends up costing time and money.

    Also note the design of insert. It first checks to see if the specified key is unique and if it is it will go ahead and perform the insertion else it will not. In order for an end user to do this using other portions of the std::map interface would require a find() and then an insert(). However note that b/c insert essentially does a find() first that if you were to find() and insert() you are performing the search for the key twice which is not efficient. The returned iterator can also be useful if you then want to have another container that references directly into the map so you can prevent the key lookup even though the lookup is very fast. There are times when the lookup is too costly to perform and this is when the returned iterator from insert is quite useful b/c you can store this iterator elsewhere and essentially jump right into the map in constant time and grab your object or data without ever performing the lookup.

    But regardless of specific implementation details of containers, how you could use them, etc., checking return values when they are available is never a bad practice. I'm not being pedantic here either because I had to learn this lesson the hard way. Failure to check return values is a big cause of bugs and yet they are the easiest bugs to avoid.
    I agree with everything you said here, but in the code I'm working on at this present time, I'm not going to bother checking that bool, the reason being that there are only a few elements in each map, and I'm pretty sure I will not attempt to insert a duplicate key. I prefer to write code that's the most efficient for a given use case, giving adequate thought to each factor of the program (both the writing of it, and the performance of the program once it is running), so in cases like this, where I don't see that checking the bool is needed, I'm not going to.
    If I write another program, and see a need for this, I'll include the code for it then. But I don't like writing unnecessary code. Simple as that (no offense).
    Last edited by Programmer_P; 01-06-2011 at 11:31 PM.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  8. #38
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Programmer_P
    I understand that (except for the entirely incorrect key part...im not sure how that applies).
    Let's say you have 26 vectors, and you want them to be associated with keys in the range [A, Z]. Suppose, by accident, you write "C" as "c". This would be an entirely incorrect key. Unfortunately, there is little you can do about it other than to manually check. (That said, you could write an automated test to check, but the process of writing the test is um, manual labour.)

    Quote Originally Posted by Programmer_P
    However, I think my observation is correct: using map[key].push_back() operations only with a map<string, vector_str_t> will return a reference to the existing mapped object if key already exists, hence why there is no risk of replacing an existing mapped value.
    Yes, but there is the risk of using an existing mapped value when you want to be inserting a new mapped value with a new key.

    Quote Originally Posted by Programmer_P
    Yes, I have considered it, and I have no real need for insert() with the particular class's maps I'll be writing, and doing stuff with, since there will only be a few string keys (associated with single vector<string> mapped values with many strings), and I'm confident that I wont end up accidentally overwriting an existing element.
    So I think your last code example's approach is what I'll be using (only for the slight performance benefits spoken about).
    Then you're good to go

    Quote Originally Posted by Programmer_P
    And yet the return value is a pair<iterator, bool> struct object which require use of its member functions, so I was thinking that maybe that might increase the time there over using map[key].push_back(string) since the [] operator gives you direct access (by a reference) to the mapped value. But I could be wrong about that (and probably am).
    Given this prototype for the member operator[] of std::map:
    Code:
    T& operator[](const key_type& x);
    This is its canonical implementation:
    Code:
    return (*((insert(make_pair(x, T()))).first)).second;
    Observe that insert is used, and the members of the struct returned is accessed. An implementation is not strictly required to implement operator[] in this way, but whatever implementation used must essentially do the same thing.
    Last edited by laserlight; 01-06-2011 at 11:38 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  9. #39
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    I was saying that there is a performance advantage (which is the same for both insert and the [] operator) using the reference approach over using map[key].push_back(string) multiple times, and that statement's based off of an earlier poster's comment (so blame him ).
    If that is the case then why did you gloss over this here
    The only advantage at all of insert() over operator[] with a map<string, vector_str_t> is the possible performance increase if you use the reference approach with insert().
    and subsequently attribute the blame to me? I mean, both quoted statements mean rather different things and I have nothing to do with what you think you say.

  10. #40
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by whiteflags View Post
    If that is the case then why did you gloss over this here

    and subsequently attribute the blame to me? I mean, both quoted statements mean rather different things and I have nothing to do with what you think you say.
    I meant the same thing with both sets of statements. I just worded it a little differently the first time, but I can see how one might misunderstand my first version. I apologize for the lack of clarity the first time, but I stand by what I meant by it.
    btw, I meant blame you if you were wrong about there being a performance advantage.
    Last edited by Programmer_P; 01-06-2011 at 11:50 PM.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  11. #41
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by laserlight View Post
    Let's say you have 26 vectors, and you want them to be associated with keys in the range [A, Z]. Suppose, by accident, you write "C" as "c". This would be an entirely incorrect key. Unfortunately, there is little you can do about it other than to manually check. (That said, you could write an automated test to check, but the process of writing the test is um, manual labour.)


    Yes, but there is the risk of using an existing mapped value when you want to be inserting a new mapped value with a new key.


    Then you're good to go


    Given this prototype for the member operator[] of std::map:
    Code:
    T& operator[](const key_type& x);
    This is its canonical implementation:
    Code:
    return (*((insert(make_pair(x, T()))).first)).second;
    Observe that insert is used, and the members of the struct returned is accessed. An implementation is not strictly required to implement operator[] in this way, but whatever implementation used must essentially do the same thing.
    I see...
    Interesting. I didn't know that the [] operator used the insert() function.
    Ok, so yeah, it looks like you're right about insert() being faster.
    So, on second thought, I think I'll use the reference approach with the insert() function (but without checking the value of the bool ).
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  12. #42
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Programmer_P
    Ok, so yeah, it looks like you're right about insert() being faster.
    So, on second thought, I think I'll use the reference approach with the insert() function (but without checking the value of the bool )
    No point. You're going to do the work that operator[] does anyway.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #43
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by laserlight View Post
    No point. You're going to do the work that operator[] does anyway.
    I guess you're right. I was thinking the following approach might be faster, but both approaches do basically the same thing, they just use different syntax.
    The speed should be the same.
    Code:
    #include <iostream>
    #include <map>
    #include <vector>
    #include <string> 
    
    using namespace std;
    
    int main() {
    
      typedef vector<string> vector_str_t;
      map<string, vector_str_t> a_map;
      vector_str_t& vct_ref = a_map.insert(make_pair("Batch1", vector_str_t())).first->second;
      vct_ref.push_back("String1");
      vct_ref.push_back("String2");
      vct_ref.push_back("String3");
      typedef map<string, vector_str_t>::iterator it_t;
      cout<< "Contents of map:\n\n";
      for (it_t it = a_map.begin(); it != a_map.end(); it++) {
            cout<< "Key value contents: " << it->first <<endl;
            vector_str_t vct = it->second;
            cout<< "Mapped value contents: ";
            for (unsigned int i = 0; i < vct.size(); i++) {
                  if (i != 0)
                      cout<< "\t\t       ";
                  cout<< vct.at(i) <<endl;
            }
      }
      
      return 0;
    
    }
    Last edited by Programmer_P; 01-07-2011 at 12:31 AM.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  14. #44
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Programmer_P
    Even if I do this?
    Yes. Notice that your insert call with the .first->second is equivalent to the canonical implementation of operator[].
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  15. #45
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    The speed should be the same.
    Speed is not the issue here and should not be used as a determining factor for which approach you take. All of a sudden you switch to insert b/c you feel it is faster and while most of use are recommending you should use insert, speed is not the reason for that recommendation. If you choose insert due to speed then you have totally missed the point.

Popular pages Recent additions subscribe to a feed