Bit arrays with STL containers

This is a discussion on Bit arrays with STL containers within the C++ Programming forums, part of the General Programming Boards category; Some time ago I went through the need to store 4 integrals ranging from 0-10 in the least possible space ...

  1. #1
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Portugal
    Posts
    7,510

    Bit arrays with STL containers

    Some time ago I went through the need to store 4 integrals ranging from 0-10 in the least possible space in memory. During the decision period of how I was going to store these values, I considered bitset<16> and vector<bool>(16) possibilities. But I convinced myself then these weren't good because my readings of the sizeof operator when applied on them gave me disheartening results. I ended up using unsigned char[2]. It's working.

    However just recently I go my hands on Herb Sutter's "Exceptional C++ Style" which discusses this very same issue and it says there both STL types have no memory overhead. So what gives?

    On my system:

    sizeof unsigned char[2] = 2
    sizeof bitset<16> = 4
    sizeof vector<bool>(16) = 14

    Why do I get these values then? What is sizeof reporting? Or... maybe the correct question is how do I know exactly how much a STL container occupies in memory?
    The programmer’s wife tells him: “Run to the store and pick up a loaf of bread. If they have eggs, get a dozen.”
    The programmer comes home with 12 loaves of bread.


    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  2. #2
    Registered User Mortissus's Avatar
    Join Date
    Dec 2004
    Location
    Brazil, Porto Alegre
    Posts
    152
    I can only imagine that, for the bitset, the compiler tries to make the size a power of 2. This happens when I declare structs or classes, for example.

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,932
    sizeof does not report the number of bytes occupied by the contents of the container, but the number of bytes of the container as a structure.

    So for bitset<16>, it could well be that it stores a pointer, and the pointer happens to be 4 bytes on your system. For vector<bool>, perhaps the overhead of maintaining the vector causes it to be 14 bytes in size even though for the storage of bools there is no overhead of making it one byte per bool.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User Mortissus's Avatar
    Join Date
    Dec 2004
    Location
    Brazil, Porto Alegre
    Posts
    152
    I found this, a piece of bitset header:
    Code:
    #define _GLIBCXX_BITSET_BITS_PER_WORD  numeric_limits<unsigned long>::digits
    #define _GLIBCXX_BITSET_WORDS(__n) \
     ((__n) < 1 ? 0 : ((__n) + _GLIBCXX_BITSET_BITS_PER_WORD - 1)/_GLIBCXX_BITSET_BITS_PER_WORD)
    
    namespace _GLIBCXX_STD
    {
      /**
       *  @if maint
       *  Base class, general case.  It is a class inveriant that _Nw will be
       *  nonnegative.
       *
       *  See documentation for bitset.
       *  @endif
      */
      template<size_t _Nw>
        struct _Base_bitset
        {
          typedef unsigned long _WordT;
    
          /// 0 is the least significant word.
          _WordT 		_M_w[_Nw];
    I think that proves my point:

    Code:
          typedef unsigned long _WordT;
    
          /// 0 is the least significant word.
          _WordT 		_M_w[_Nw];
    It declares an array of words (4 bytes each word, right?).

  5. #5
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Portugal
    Posts
    7,510
    hmm... That being the case bitset is really out of the question if I'm concerned with the amount of memory being used.

    vector<bool> on the other hand could be... but I suspect the pointer laserlight refers points probably also to a long...

    So these do seem to carry an overhead on my case despite the number of bits needed could fit in 2 bytes.

    I'm still somewhat confused since Sutter does mention they don't have any overload. But at least I know unsigned char[2] does what is expected
    The programmer’s wife tells him: “Run to the store and pick up a loaf of bread. If they have eggs, get a dozen.”
    The programmer comes home with 12 loaves of bread.


    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,932
    hmm... That being the case bitset is really out of the question if I'm concerned with the amount of memory being used.
    What kind of numbers are you looking at for the memory consumption? How are the four numbers going to be used?

    I'm still somewhat confused since Sutter does mention they don't have any overload. But at least I know unsigned char[2] does what is expected
    I do not have a copy of that book to check, but I suspect that what Sutter means by no overhead is that instead of taking a byte's worth of space for each bit (or bool) stored, a bitset (or vector<bool>) packs in as many bits (or bools) as possible. The overhead of maintaining the container surely would remain.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Portugal
    Posts
    7,510
    What kind of numbers are you looking at for the memory consumption? How are the four numbers going to be used?
    The class that wraps this unsigned char[2] (its only data member) is a standalone one. It is expected to be used inside a vector of up to 1 million elements.

    The actual need to minimize the memory usage to an absolute minimum is not real, mind you. I could do with that vector occupying 2 megabytes, for instance (that's twice what it does with unsigned char[2]). I just defined that need myself since during the C++ learning process I like to up the requirements whenever I can and try to deal with the new obstacles.

    So, the real question is "what if I needed to pack it in the least possible space"
    The programmer’s wife tells him: “Run to the store and pick up a loaf of bread. If they have eggs, get a dozen.”
    The programmer comes home with 12 loaves of bread.


    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    835
    I'm dealing with a similar situation where I need a large number of instances of a class where each instance only needs a fairly small, but variable, amount of memory. I finally decided to allocate a single large memory pool in the form of an array, and pass pointers to pieces of it to the individual instances. That basically eliminates the per-instance overhead (each instance still needs to know the size of its chunk somehow), and also memory fragmentation.

    Edit: In your case, it sounds like the amount of memory each instance needs is fixed, so it doesn't have to store the size. BTW, if you REALLY need to minimize the storage, since you have 4 variables each with 11 possible values, the number of possible combinations is 11^4 == 14641 < 2^14, so you only need 14 bits, if you're willing to do the extra processing to extract the values.
    Last edited by robatino; 07-02-2007 at 10:19 AM.

  9. #9
    Registered User manofsteel972's Avatar
    Join Date
    Mar 2004
    Posts
    317
    Edit:

    If you define each of the bits you could hold all 4 numbers using 18 bits. The first two bits would determine 0-4 then next 4 bits would be the number from 0-16 in your case you wouldn't need the extra so
    00 1010 would be the 0th number ==10 if you did 01 0001 that would the the 1st number ==1 and so on.

    If you pack it all together you would have

    00 0000 0000 0000 0000 with the first two bits indicating which 4 bit groups is currently selected
    Last edited by manofsteel972; 07-02-2007 at 11:57 AM.
    "Knowledge is proud that she knows so much; Wisdom is humble that she knows no more."
    -- Cowper

    Operating Systems=Slackware Linux 9.1,Windows 98/Xp
    Compilers=gcc 3.2.3, Visual C++ 6.0, DevC++(Mingw)

    You may teach a person from now until doom's day, but that person will only know what he learns himself.

    Now I know what doesn't work.

    A problem is understood by solving it, not by pondering it.

    For a bit of humor check out xkcd web comic http://xkcd.com/235/

  10. #10
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,932
    Just came to mind: is unsigned short 2 bytes on your system? That may be even better than unsigned char[2].
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  11. #11
    Registered User
    Join Date
    Jan 2005
    Posts
    7,340
    Where in Exceptional C++ Style is Sutter's discussion?

    >> the real question is "what if I needed to pack it in the least possible space"
    If that is the real question, I would not use the STL (unless you could find a solution that really does have zero extra overhead, which I doubt). The STL is recommended so often because in most situations you don't really need to pack it in the least possible space.

  12. #12
    Registered User
    Join Date
    Sep 2001
    Posts
    752
    vector<bool> can't possibly have 0 overhead, because of the operations it must support. It is going to have to keep at least a pointer and two size variables.

    bitset<16>... well, this becomes a discussion of what is 'overhead'.

    Honestly though, have you considered just using bit fields?
    Code:
    #include <iostream>
    
    struct pakd {
       unsigned char a:4;
       unsigned char b:4;
       unsigned char c:4;
       unsigned char d:4;
    };
    
    int main (void) {
       std::cout << sizeof(pakd);
       return 0;
    }
    Callou collei we'll code the way
    Of prime numbers and pings!

  13. #13
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    Typical vectors have three pointers and no size variables. I wonder where your additional two bytes come from.

    I think bit fields would be a good solution.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  14. #14
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Portugal
    Posts
    7,510
    Quote Originally Posted by laserlight
    Just came to mind: is unsigned short 2 bytes on your system? That may be even better than unsigned char[2].
    Didn't ever think of that
    You are absolutely right.

    Quote Originally Posted by manofsteel972
    If you define each of the bits you could hold all 4 numbers using 18 bits.
    Nah. No need to be fiddling around when I can pack them in 16 bits.

    Quote Originally Posted by Daved
    Where in Exceptional C++ Style is Sutter's discussion?
    I apologize since I've been buying a series of books lately and got confused. Exceptional C++ Style does discuss this issue in detail. You can find it in Chapter 27, Data Formats and Efficiency Part 2).

    However, it was in Josutti's C++ Standard Library (6.2.6 Class vector<bool>) that I've read about the fact that "The vector<bool> specialization usually uses internally only 1 bit for an element" and of bitset being also a bitfield, although with static size.

    Quote Originally Posted by Daved
    If that is the real question, I would not use the STL (unless you could find a solution that really does have zero extra overhead, which I doubt).
    Yeah. I guess I learned that through this discussion

    Quote Originally Posted by QuestionC
    Honestly though, have you considered just using bit fields?
    I was totally unaware of that language feature. The section about portability made me wonder though... Do you know of any present problems in their usage among the recent Linux, Mac and Windows flavors?
    The programmer’s wife tells him: “Run to the store and pick up a loaf of bread. If they have eggs, get a dozen.”
    The programmer comes home with 12 loaves of bread.


    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  15. #15
    Registered User
    Join Date
    Jan 2005
    Posts
    7,340
    >> I apologize since I've been buying a series of books lately and got confused.
    I wasn't trying to imply that it doesn't say that, I was just curious and wanted to read the context myself. Unfortunately I misread Exceptional C++ Style as Exceptional C++, and so as it turns out I can't easily look it up. Thanks for providing the information anyway.

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Noob STL question about Containers.
    By Swerve in forum C++ Programming
    Replies: 2
    Last Post: 03-15-2009, 12:02 PM
  2. bit level permutation function
    By zxcv in forum C Programming
    Replies: 2
    Last Post: 07-27-2008, 01:26 PM
  3. bind on member functions of stl containers
    By synss in forum C++ Programming
    Replies: 3
    Last Post: 07-03-2008, 04:38 AM
  4. pointers & arrays and realloc!
    By zesty in forum C Programming
    Replies: 14
    Last Post: 01-19-2008, 03:24 PM
  5. stl - searching for elements in containers
    By Raven Arkadon in forum C++ Programming
    Replies: 4
    Last Post: 03-24-2005, 10:10 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21