Thread: vector.push_back bad_alloc problem

  1. #1
    Registered User
    Join Date
    Apr 2009
    Posts
    3

    vector.push_back bad_alloc problem

    Hi,
    I have a large vector of structures (each contains 9 doubles, 1 int, 1 bool). When I "push_back" one more entry (almost 17,000,000'th), I get a bad_alloc exception. I am attaching a small script where I reproduced the problem
    Code:
    #include<iostream>
    #include<vector>
    using namespace std;
    
    struct InclEvent{
      int incl_index;  
      double p_e;  
      double Q2;
      double W;  
      bool tagged;     // true/false tagged/untagged
      double p_spec; 
      //  double p_spec_raw;
      double cos_th_pq; 
      double Wstar;  
      double deltaz;
      double spec_theta;
      double spec_z;
    };
    
    int main ()
    {
    
     int get_res;
    
     InclEvent temp_evnt;
    
     int size=18000000;
    
     vector<InclEvent> Events;
     // Events.resize(size);
    
     
     for(uint ctr=0;ctr<size;ctr++){
       Events.resize(ctr+1);
       cout<<"ctr "<<ctr<<endl;
       temp_evnt.incl_index=ctr;
       temp_evnt.p_e = 1.1;      
       temp_evnt.Q2 = 1.2;
       temp_evnt.W  = 1.3;
       temp_evnt.tagged = false;
       temp_evnt.p_spec = -10.0;
       temp_evnt.cos_th_pq=-10.0;            
       temp_evnt.Wstar = -10.0;      
       temp_evnt.deltaz= -10.0;
    
       Events.at(ctr)=temp_evnt;
       //   Events.push_back(temp_evnt);
     }
    
     cin>>get_res;
     // delete Events;
    
     return 0;
    }
    Please, pay attention to commented out pieces, they describe different ways of getting/not getting the problem: if I allocate the "size" elements right away in the vector declaration, and assign to each using operator=, I am fine; if I do "resize" right away for the whole size, I'm fine. If I try to either push_back (uncomment push_back line and comment out resizing and assignment in the loop), or resize element by element, I get the exception.

    I am using g++: target i486-linux-gnu, gcc version 4.1.3.

    Please, help to resolve this.

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    You're out of memory. As you push the 17000001th element, the vector has already allocated a massive block of memory capable of holding 17000000 elements (at 88 bytes each - 1.4 GiB). A typical growth strategy is to double the storage on each reallocation, so the vector now tries to allocate memory for 34000000 elements. This is a massive 2.8 GiB - and it's in addition to the 1.4 GiB you've already allocated, because the vector needs to keep that block around so that it can copy elements over. There is no 32-bit system where you can have 4.2 GiB allocated at once - even if you had the whole address room, you couldn't go beyond 4 GiB, because that's what the 32-bit pointers can address. In fact, however, the OS typically only gives you 2 GiB of that, and that's already cluttered with code, the stack, and various other stuff. You're actually pretty lucky to succeed at the 1.4 GiB allocation.

    If you have a 64-bit system, a 64-bit OS, and the program compiled as 64-bit, you could succeed. But unless you actually have the physical RAM to back the allocation, the computer will start swapping and slow to a crawl.
    Last edited by CornedBee; 04-23-2009 at 08:35 AM.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    your structure is (approximately) 77 bytes in size. 17 million of them is about 1.3GB. what I'd guess is happening is that it's hitting its capacity at that point and trying to resize itself. as I understand it, most STL implementations try to allocate double their current size when they resize. so you have the 1.3GB currently in memory, and it's trying to allocate 2 x 1.3GB more for a total of 3.9GB, and that's probably well over the limit for your process. I think 32-bit windows only gives you a 2GB address space. I'm not sure what the limit is on other operating systems.

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    When you push_pack on a vector, and the vector needs to grow, it essentially does something like this:
    Code:
     
    template <typename T> vector::push_back(...)
    {
       elements++;
       if (elements > capacity)
       {
          capacity += growth; // growth may be a constant or calculated based on current size.. 
          T *new_content = new T [capacity]; 
       }
       // ... copy content to new_content .. 
       swap(content, new_content);
       delete [] new_content;
    }
    Since content and new_content co-exist, you will need aproximately 2x the size of your vector content - that would be AT LEAST 80 bytes - 17 million elements of 80 bytes is 1.3GB of data - so you are using 2.6GB or thereabouts, not counting any other data or code. The absolute maximum of memory usage in 32-bit Linux is a little bit under 3GB.

    [Actually, looking at your struct, I think it takes up 96 bytes, since your int and bool are intermixed with the double data items - if you move them to the bottom, you will get a smaller structure by about 16 bytes - or about 260MB. Add 260MB to both of the the above 1.3GB, and we get about 1.6GB x 2 -> 3.2GB - outside what you can possibly achieve].

    Moving to a 64-bit version of Linux would possibly solve this problem.


    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Registered User
    Join Date
    Apr 2009
    Posts
    3
    Thank you guys. I was not aware of this temporary doubling of the memory. I guess, I'll have to give up push_back in this case and do it some other way.

    Thank you for your help.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by santos03 View Post
    Thank you guys. I was not aware of this temporary doubling of the memory. I guess, I'll have to give up push_back in this case and do it some other way.

    Thank you for your help.
    Yes, if you can figure out what the size is supposed to be then that'll work.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    You can use reserve instead of resize, and then you can still use push_back if you want.

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Does the gigantic vector really need to be in memory? Right now you are coping with it, but if the data set gets much larger it will no longer fit in memory, even without this buffer-doubling issue.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I would start looking at reducing the size of that structure if possible.
    For example, do all of those doubles need to be double, or could some of them be float? In fact is a float more precision than some of those variables need, because you could use a class that packs a floating point number into a short, halving the memory usage again (I just so happen to have such a class).
    You may also be able to represent the int using 31 bits of a bitfield and store the bool in the remaining one bit.
    If any of those variables are very often the same as in other instances of the struct then perhaps you could take a paletteised approach where you store the unique values in another table and store only an index into that table in this struct.
    If any of the values can be derived from the other values, then simply omit them and calculate them as needed accessing them from a member function rather than a variable.
    After all that, check your alignment settings as you may be able to squeeze out a few more bytes by aligning to a smaller amount. Or just make sure that the variables in the struct are laid out from biggest to smallest.

    One way or another it should be possible to approximately halve that struct (or better).
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    You can ease (but not eliminate) the problem by using vector's reserve() method before the loop, if you can compute the maximum size in advance. This does the memory allocation in one step, without (often) allocating extra.

    More generally, however, it would be better to find a way of avoiding the need to keep that number of elements in memory the first place. Without knowing what you're really trying to achieve in your application, it's hard to advise on strategies.

    Generically however, in working with large event-based simulations, I've rarely encountered a need to have an event queue of more than about 30 events. The events usually trigger some changes of state (eg in a world model): as long as that state is tracked, it is rarely necessary to have large queues of contributing events in memory.

  11. #11
    Registered User
    Join Date
    Apr 2009
    Posts
    3
    For now, I solved the problem by figuring out the max size, using resize and then erase.

    Indeed, if the data sets get larger (I don't think it'll be the case, but I am not certain - will check it VERY soon) I'll look into the suggestion of playing with data types, and maybe, eliminating some variables that can be deduced from others (I've done something like this - the structure used to be larger).

    Values don't repeat - these are momenta and derived variables from a physics data set. I'll need to google for "alignment settings". Never heard this word combination with respect to programming. Thank you for your suggestions

  12. #12
    The larch
    Join Date
    May 2006
    Posts
    3,573
    using resize and then erase.
    That's what reserve does, except that resize should also call the copy constructor 18 million times to ensure that all the instances contain the same uninitialized garbage or all zeros - not sure about that - before you erase them.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  13. #13
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Don't use resize() & erase(), use reserve() instead. It makes the vector the size you tell reserve to make it without actually creating millions of default elements.

    But as others have pointed out, you still have a scalability issue, you're just putting a bandaid on gun shot wound. Find out how much of your array actually needs to be in memory at one time, then only load that much and leave the rest in a file; when you're done with that chunk of data, throw it away and load the next chunk of data into memory...
    "I am probably the laziest programmer on the planet, a fact with which anyone who has ever seen my code will agree." - esbo, 11/15/2008

    "the internet is a scary place to be thats why i dont use it much." - billet, 03/17/2010

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Need help understanding a problem
    By dnguyen1022 in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2009, 04:21 PM
  2. Memory problem with Borland C 3.1
    By AZ1699 in forum C Programming
    Replies: 16
    Last Post: 11-16-2007, 11:22 AM
  3. Someone having same problem with Code Block?
    By ofayto in forum C++ Programming
    Replies: 1
    Last Post: 07-12-2007, 08:38 AM
  4. A question related to strcmp
    By meili100 in forum C++ Programming
    Replies: 6
    Last Post: 07-07-2007, 02:51 PM
  5. WS_POPUP, continuation of old problem
    By blurrymadness in forum Windows Programming
    Replies: 1
    Last Post: 04-20-2007, 06:54 PM