Thread: What NOT to do with std::vector.data() and push_back()

  1. #1
    Registered User
    Join Date
    Jul 2014
    Location
    Calgary, Canada
    Posts
    29

    What NOT to do with std::vector.data() and push_back()

    At one time or another we've probably all run across old code that has functions that do internal dynamic memory allocation using this type of setup:

    int func(int** pointer);

    Inside the function there will be a (*pointer) = new int[N] statement, or a malloc(), to allocate memory. Then the function return value will give you the dimension of (*pointer), after the allocation.

    Nowadays of course std::vector is a much better idea, but if you want to use std::vector in your calling code and also keep using func(), you have two choices:
    1. Update func() to use std::vectors instead of pointers.
    2. Leave func() the way it is, and copy the output elements of (*pointer) into the elements of an std::vector.
    Option (1) could turn out to be a headache, and (2) is going to slow down your code.

    When I first discovered std::vector::data(), I immediately wondered whether there might be some way of using it to drop the memory that func() allocates directly into an std::vector, without any copying being necessary -- or any major changes to func() being needed.

    So I tried two approaches today, just to see if it was possible to do it. Both of these use placement new (which, by the way, I had never used before today).

    The first approach turned out to be one of the worst ideas I have had in a long time. The unnerving thing about it is that it compiled in GCC 4.7.3 without throwing any warnings (even with -Wall turned on), and ran in Valgrind without throwing any access errors.

    I think the root problem is probably that push_back() itself uses placement new in some fashion, and it isn't expecting the source object to be occupying the same memory location as the object being constructed.

    Mostly I'm posting this as a cautionary note, because that approach seems like the kind of thing other people might try to hack together; and, because it compiles and runs in the debugger, it looks like it works.

    Code:
    #include <iostream>
    #include <vector>
    
    
    // This class is here only to let us build arrays and std::vectors of its
    //   objects. All it contains is an embedded integer.
    // To get rid of it, just comment it out and replace it with:
    // using Int = int;
    
    
    class Int
    {
     public:
      int i;
    
    
      Int() : i(0)
      {
        std::cout<<"Int()"<<std::endl;        // Default constructor
      }
    
    
      explicit
      Int(int j) : i(j)        
      {
        std::cout<<"Int(int)"<<std::endl;     // Constructor from int
      }
    
    
      Int(const Int& that) : i(that.i)
      {
        std::cout<<"Int(Int) "<<this<<        // Copy constructor
                 " from "<<&that<<std::endl; 
      }
    
    
      Int& operator=(const Int& that)         // Copy assignment operator
      {
        if (this != &that)
        {
          this->i = that.i;
    
    
          std::cout<<"Int = Int"<<std::endl;
        }
        return *this;
      }
    
    
      Int& operator=(int thing)               // Assignment from int
      {
        this->i = thing;
    
    
        std::cout<<"Int = int"<<std::endl;
    
    
        return *this;
      }
    
    
      operator int()                          // Integer operator to make it
      {                                       //   easier to use the above typedef
        return i;
      }
    };
    
    
    // Simple memory allocation by pointer, inside a legacy-code-type function.
    
    
    int allocate_03(Int** pointer)
    {
      int L = 3;              // Array length; not known to the calling function.
                              // However, we might know a maximum possible value.
    
    
      *pointer = new Int[L];  // Allocate the array
    
    
      for (int i=0; i<L; ++i) // Do something with the elements -- this could be
        (*pointer)[i] = i;    //   really long; something we don't want to mess
                              //   with in practice.
    
    
      return L;               // Return the length
    }
    
    
    // The above with placement new added. Only two changes are needed, which is
    //   nice. The caveat is that we had better be sure 'place' has enough room.
    
    
    int allocate_11(Int** pointer, Int* place) // <-- Change #1
    {
      int L = 3;
    
    
      *pointer = new(place) Int[L];            // <-- Change #2
    
    
      for (int i=0; i<L; ++i)
        (*pointer)[i] = i;
    
    
      return L;
    }
    
    
    int main()
    {
      // The "legacy" version of the function would be used along these lines.
    
    
      std::cout<<"Legacy version:\n\n";
    
    
      Int* p03 = NULL;
    
    
      int  N03 = allocate_03(&p03); // ... do things with p03...
    
    
      delete[] p03;
    
    
      // Now for the modified version. The objective here is to use placement new
      //   in allocate_11() to drop the array elements directly into the reserved
      //   memory of an std::vector<Int>, instead of performing element-by-
      //   element copying of the output of allocate_03 into the std::vector.
      //
      // I tried two approaches. The first one I know is a really bad idea, and
      //   the second may also have problems.
      // I've put them inside their own scopes for clarity.
    
    
      { // Approach 1:
        // Define V, an std::vector<Int> with size 0 and capacity C. 
        // By reserving memory, but not initializing any elements, we avoid
        //   calling default constructors.
    
    
        std::cout<<"\nModified version, take 1:\n\n";
    
    
        std::vector<Int> V;
        int C = 7;
        V.reserve(C);
    
    
        // Here allocate_11 uses placement new to drop an array of Ints straight
        //   into V.data(). There are N11 of these in total.
        // Assumption: (N11 <= C), otherwise we're hooped.
    
    
        Int* p11 = nullptr;
        int  N11 = allocate_11(&p11, V.data());
    
    
        // The problem is that V.size() is still 0 at this point.
        // We can't V.resize(N11) here either. That would zero out all of the
        //   array elements that were returned by allocate_11.
        //
        // So I got the bright idea of using push_back(), and bounds-unchecked V[]
        //   element access, to "add" the elements one at a time inside a loop.
        //   This was my way of "fixing" V.size().
    
    
        for (int i=0; i<N11; ++i)
        { 
          V.push_back(V[i]);
      
          std::cout<<"V["<<i<<"]="<<int(V[i])<<"; V.size()="<<V.size()<<std::endl;
        }
    
    
        // The above is a *really* awful idea though, for at least three reasons.
        //
        //   (1) Accessing elements past V.size()-1 is undefined behaviour.
        //         I knew this from the beginning, but thought it might somehow
        //         still be "okay", since V.capacity() should be large enough to
        //         contain all of the allocated memory.
        //   (2) The push_back() will call the copy constructor of every element.
        //         That defeats the purpose of doing all this in the first place.
        //         We might as well have copied over the output of allocate_03.
        //   (3) More ominously, in the copying operations described in (2),
        //         the source object turns out to be at the same memory location
        //         as the object under construction. Yikes!!!!
        //       I suppose one could copy the (this != &that) guards from the 
        //         assignment operator into the copy constructor. Avoiding this
        //         idea like the plague would be a better solution, though.
      }
    
    
      { // Approach 2:
        // Define V, an std::vector<Int> with size C and capacity C.
        // The extra overhead here is that the default constructors of all elements
        //   from 0 to (C-1) get called straight away.
    
    
        std::cout<<"\nModified version, take 2:\n\n";
    
    
        std::vector<Int> V;
        int C = 7;
        V.resize(C);
    
    
        // Place the array created in allocate_11 at V.data().
        // Assumption: (N11 <= C), otherwise we're hooped.
    
    
        Int* p11 = nullptr;
        int  N11 = allocate_11(&p11, V.data());
    
    
        // Resize the vector down to length N11.
    
    
        V.resize(N11);
    
    
        for (int i=0; i<N11; ++i)
        { 
          std::cout<<"V["<<i<<"]="<<int(V[i])<<"; V.size()="<<V.size()<<std::endl;
        }
      }
    }

  2. #2
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Mostly I'm posting this as a cautionary note, because that approach seems like the kind of thing other people might try to hack together; and, because it compiles and runs in the debugger, it looks like it works.
    O_o

    You should get more practice using a debugger. Your use of placement `new' is heinous. You are using placement `new' to construct an object where an object already lives. The problems you've referenced are almost insignificant in the face of repeat construction. A destructor should be called for every object successfully created with a constructor, but you aren't calling the destructor for the objects that live in the memory you reuse. You just can't go around telling lies about the lifetime of objects.

    You should, in practice, pretend placement `new' doesn't exist outside of writing allocators or similar.

    [Edit]
    To be clear, my code is also broken. My code just makes the source of a major problem clear without needing to understand a debugger well enough to see constructors and destructor being called.
    [/Edit]

    Your `Int' example doesn't really do anything. Use my code which pretends to allocate resources.

    Soma

    Code:
    #include <cstdlib>
    #include <ctime>
    #include <iostream>
    
    const unsigned int kObjects(5);
    
    unsigned int gObjects(0);
    
    unsigned int gResources(0);
    
    struct STest
    {
        STest():
            m(rand() % 32)
        {
            ++gObjects;
            gResources += m;
        }
        STest
        (
            const STest & fOther
        ):
            m(fOther.m)
        {
            ++gObjects;
            gResources += m;
        }
        ~STest()
        {
            --gObjects;
            gResources -= m;
        }
        STest & operator =
        (
            const STest &
        )
        {
        }
        unsigned int m;
    };
    
    /************************************************/
    
    void Broken
    (
        STest * fStorage
    )
    {
        new(fStorage) STest[kObjects];
    }
    
    /************************************************/
    
    #include <vector>
    
    int main()
    {
        std::srand(std::time(0));
        {
            std::vector<STest> s(kObjects);
            Broken(s.data()); // You can remove this line to prevent the repeat construction.
        }
        {
            std::vector<STest> s;
            s.reserve(kObjects);
            Broken(s.data()); // You can remove this line to prevent the repeat construction.
            for(int c(0); c < kObjects; ++c)
            {
                STest sIgnored; // I didn't want access problems to disguise the source of the error.
                s.push_back(sIgnored);
            }
        }
        std::cout << "objects not destroyed: " << gObjects << '\n';
        std::cout << "resources not reclaimed: " << gResources << '\n';
        return(0);
    }
    Last edited by phantomotap; 01-22-2015 at 09:19 PM.
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  3. #3
    Registered User
    Join Date
    Jul 2014
    Location
    Calgary, Canada
    Posts
    29
    Agreed on all counts.

    Thanks for posting this example. I just went over it and now I see your point about repeat construction. Yikes. I had been wondering whether there were any other flaws in my reasoning, and that one is a real whopper.

    As far as being a cautionary tale is concerned, there are probably two morals to this story:

    (1) Don't use anything that you don't thoroughly understand to create a hacked-together "workaround" for a problem. In my case, this was placement new (which I intend to avoid from now on).

    (2) Don't mess around with undefined behaviour. Even if something looks like it "should be safe", there could be unseen problems waiting to pounce.

    By the way, when I said "run in the debugger" above, I really meant "run in the access checker", i.e. Valgrind.

  4. #4
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Whoa, Grumpulus and grumpy? Hmm... grumpy, is this your apprentice by any chance?

  5. #5
    Citizen of Awesometown the_jackass's Avatar
    Join Date
    Oct 2014
    Location
    Awesometown
    Posts
    269
    Quote Originally Posted by MutantJohn
    Whoa, Grumpulus and grumpy? Hmm... grumpy, is this your apprentice by any chance?
    Maybe he is Grumpy's kid son? xD
    "Highbrow philosophical truth: Everybody is an ape in monkeytown" --Oscar Wilde

  6. #6
    Unregistered User Yarin's Avatar
    Join Date
    Jul 2007
    Posts
    2,158
    Option (1) could turn out to be a headache
    No, not at all, it would be very easy!

    Take for instance
    Code:
    int allocate_03(Int** pointer)
    {
      int L = 3;              // Array length; not known to the calling function.
                              // However, we might know a maximum possible value.
     
     
      *pointer = new Int[L];  // Allocate the array
     
     
      for (int i=0; i<L; ++i) // Do something with the elements -- this could be
        (*pointer)[i] = i;    //   really long; something we don't want to mess
                              //   with in practice.
     
     
      return L;               // Return the length
    }
    The upgraded code would look like
    Code:
    int allocate_03(vector<int>* pointer)
    {
      int L = 3;              // Array length; not known to the calling function.
                              // However, we might know a maximum possible value.
     
     
      pointer->resize(L);  // Allocate the array
     
     
      for (int i=0; i<L; ++i) // Do something with the elements -- this could be
        (*pointer)[i] = i;    //   really long; something we don't want to mess
                              //   with in practice.
     
     
      return L;               // Return the length
    }

    In this way, you'd only need to change the function's arguments, replace malloc()/new with a resize, and replace any free()/delete with resize(0).
    Last edited by Yarin; 01-23-2015 at 02:01 PM.

  7. #7
    Registered User
    Join Date
    Jul 2014
    Location
    Calgary, Canada
    Posts
    29
    That does indeed seem to be the way to go with the original problem. Thanks, Yarin.


    I was so locked into the idea of somehow using the data() pointer to do this that I didn't even notice the nice clean solution that was staring me right in the face. Probably because, at first, I was trying to figure out whether the original goal could be accomplished without modifying the legacy function at all.


    This quickly turned out to be impossible because std::vector::data() doesn't return the internal vector pointer by reference. It returns by value, so trying to do something like passing &(V.data()) to the first argument of allocate_03 is meaningless and will make the compiler scream at you.


    Also, I hadn't noticed there was someone else here with a similar user name. Oh well.

  8. #8
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Don't modify internal data structures. That should be a given. You should only modify stuff returned by some public member function if that member function allows you to do that.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Vector push_back strangeness
    By Ducky in forum C++ Programming
    Replies: 11
    Last Post: 08-07-2014, 10:55 AM
  2. Vector<char> push_back
    By Ducky in forum C++ Programming
    Replies: 4
    Last Post: 06-15-2013, 05:52 AM
  3. Cannot push_back string at the end of vector
    By fnoyan in forum C++ Programming
    Replies: 1
    Last Post: 05-23-2013, 10:36 PM
  4. Push_back not storing data to vector
    By csonx_p in forum C++ Programming
    Replies: 13
    Last Post: 07-27-2008, 04:38 AM
  5. vector.push_back(anotherVector)
    By boojus in forum C++ Programming
    Replies: 2
    Last Post: 11-22-2003, 05:07 PM