Thread: bool vector operations - a more efficient way needed

  1. #1
    Registered User
    Join Date
    Jan 2011
    Posts
    222

    bool vector operations - a more efficient way needed

    hi,

    well i am in a need of a more efficient way to do the following:

    I am interested into overlapping intervals, actually their start and end position. What I have is a set of intervals (2-7, 4-19, 21-43 ect.) some of them overlap each other, some are embedded one into each other and some are separate. the number of intervals is large (high number of them is repeated and so on - basically what I am implying is that no pre-sorting is going to help) and the maximum an minimum interval borders tend to be between 1 and 100000000.

    So I was thinking is vector<bool> - they are small (right ??) and my algorithm would go something like this:


    Code:
    #include <vector>
    
    using namespace std;
    
    int main(){
    
       for(different sets of borders){
          // for each set of borders which are stored as simple vectors having pairs of 2 defining start and stop of the interval (vector<int> borders = {2,7,4,19,21,43 ...}), do the following:
    
          vector<bool> vec(max,false) // max is the maximum value in borders vector 
          for(int i = 0; i< borders.size(); i+=2 )
             for(int j = borders[i]; j<borders[i+1]; j++ )
                vec[j]=true;
    
          for(auto it = vec.begin(); it != vec.end(); it++){
             //print start and stop positions of true intervals
          }
       }
      return 0;
    }
    is there a faster / more elegant way to do this ?

    and yes memory-wise my borders in the above example outstrip my bool vector this is because in reality the intervals are received on-line and there in no actual border vector- that is also why sorting is not a solution.

    Any suggestions?

  2. #2
    Guest
    Guest
    Could you give an example input and output, I think that makes it much easier to understand. I think I understand your input, but I'm not sure what exactly it is you are trying to determine.

    So I was thinking is vector<bool> - they are small (right ??)
    Optimized for space (if that's your priority), yes.

  3. #3
    Registered User
    Join Date
    Jan 2011
    Posts
    222
    a small example with the static input:
    Code:
    #include <vector>
    #include <iostream>
    
    using namespace std;
    
    int main(){
    
      vector<bool> vec(1000,false);
      vector <int> borders;
      borders.push_back(23);
      borders.push_back(43);
      borders.push_back(63);
      borders.push_back(71);
      borders.push_back(21);
      borders.push_back(29);
      borders.push_back(3);
      borders.push_back(19);
      borders.push_back(221);
     borders.push_back(233);
    
      for(int i = 0; i< 10; i+=2 )
        for(int j = borders[i]; j<borders[i+1]; j++ )
          vec[j]=true;
      int i=0;
      bool x = false;
      for(auto it = vec.begin(); it != vec.end(); it++){
        
        if((*it == true && x==false) || (*it == false && x==true)){
          cout << i ;
          if(x==true){
            x=false;
            cout << ",";
          }else{
            cout << "-";
            x=true;
          }
        }
        i++;
      }
      cout << endl;
      return 0;
    }
    as a result it prints out:
    3-19,21-43,63-71,221-233

    which corresponds to intervals of true vector values. Note that 21-43 is an interval composed of 21-29 and 23-43
    Last edited by baxy; 11-05-2015 at 08:23 PM.

  4. #4
    Guest
    Guest
    Thanks, I understand the goal now. Start/end pairs are the desired end result; the vector<bool> is just for intermediate computation, yes? And new input pairs arrive while the program is running, correct? I'll try to come up with something different tomorrow and reply here.

  5. #5
    Guest
    Guest
    I thought about it and given the constraints you gave, your solution looks pretty good. So if a pair arrives, simply write its range into the bool vector. If you can spare the memory, you could try vector<char> with 0/1 to see if that's faster. vector<bool> is a special implementation that might not be as fast, even though it uses a lot less space.

    Depending on how long this program receives input (you said a lot of the ranges might repeat) you could try going with a sorted vector still:
    Code:
    vector<pair<int, int>> ranges;
    1. Have vector sorted by first element ("start").
    2. When new pair arrives, do std::lower_bound on ranges, comparing against first element.
    3. Then compare the returned iterator in detail to your new pair.
    4. A1) If there is overlap and the new pair isn't fully inside the range, extend range's start and/or end values.
    5. After extending it, compare iterator to the element before and after it to ensure they don't overlap now. If they do, "merge" them into a single element. (get rid of superfluous elements by swapping them with end of vector, then resizing)
    6. A2) If there is overlap but the range fully contains the new pair, do nothing.
    7. B) If there is no overlap with present ranges, push the new pair onto the ranges vector and sort the vector again by first element.

    Sounds a lot more complicated than just writing incoming ranges into the bool/char vector, doesn't it? It might not be faster either. A disproportionately high occurrence of A2 cases might justify trying it. The only clear advantage, assuming the logic is sound, would be much smaller memory consumption. You'd only keep the actual start-end pairs in memory, not every possible element from 0 to 100 million or whatever your possible range is.

    Someone with actual knowledge of algorithms can probably give you something way better, like a tree structure to quickly query and update ranges.
    Last edited by Guest; 11-06-2015 at 11:48 AM.

  6. #6
    Registered User
    Join Date
    Jan 2011
    Posts
    222

    Smile

    Quote Originally Posted by Guest View Post
    I thought about it and given the constraints you gave, your solution looks pretty good. So if a pair arrives, simply write its range into the bool vector. If you can spare the memory, you could try vector<char> with 0/1 to see if that's faster. vector<bool> is a special implementation that might not be as fast, even though it uses a lot less space.

    Depending on how long this program receives input (you said a lot of the ranges might repeat) you could try going with a sorted vector still:
    Code:
    vector<pair<int, int>> ranges;
    1. Have vector sorted by first element ("start").
    2. When new pair arrives, do std::lower_bound on ranges, comparing against first element.
    3. Then compare the returned iterator in detail to your new pair.
    4. A1) If there is overlap and the new pair isn't fully inside the range, extend range's start and/or end values.
    5. After extending it, compare iterator to the element before and after it to ensure they don't overlap now. If they do, "merge" them into a single element. (get rid of superfluous elements by swapping them with end of vector, then resizing)
    6. A2) If there is overlap but the range fully contains the new pair, do nothing.
    7. B) If there is no overlap with present ranges, push the new pair onto the ranges vector and sort the vector again by first element.

    Sounds a lot more complicated than just writing incoming ranges into the bool/char vector, doesn't it? It might not be faster either. A disproportionately high occurrence of A2 cases might justify trying it. The only clear advantage, assuming the logic is sound, would be much smaller memory consumption. You'd only keep the actual start-end pairs in memory, not every possible element from 0 to 100 million or whatever your possible range is.

    Someone with actual knowledge of algorithms can probably give you something way better, like a tree structure to quickly query and update ranges.

    Thank you

  7. #7
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    The best way to improve the algorithm might be to implement it with bitset.

  8. #8
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Although now that I think about it, there is a way to do it without using a bunch of bits. Sorting can work. If you have all the intervals, than sort every pair of end points in increasing order. That is, (1, 2) comes before (1, 5) which comes before (2, 4) or what have you. std::pair already works with this ordering.

    From there all you have to do is check the ways that intervals can overlap, except we know that there is no interval that comes before the ones being looked at.

    Code:
    #include <algorithm>
    #include <vector>
    #include <iostream>
    #include <utility>
    
    
    using namespace std;
    typedef pair<int, int> Interval;
    
    
    void Merge(vector<Interval>& ranges)
    { 
        sort(ranges.begin(), ranges.end());
        for (size_t i = 0; i < ranges.size() - 1; ) 
        {
            size_t j = i + 1;
            int a = ranges[i].first, b = ranges[i].second;
            int x = ranges[j].first, y = ranges[j].second;
            if (b - x > 0) // b is in x - y, so the range could be expressed as a - y
            {
                ranges[i] = Interval(a, y);
                ranges.erase(ranges.begin() + j);
            }
            else if (x < a && y > b) // a - b lies between x - y
            {
                ranges.erase(ranges.begin() + i);
            }
            else i++;   
        }
    }
    You would have to measure the performance of the two algorithms to see if one is faster. I just know this requires no extra space.
    Last edited by whiteflags; 11-08-2015 at 03:59 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. How get bool's from std::vector<bool>
    By 6tr6tr in forum C++ Programming
    Replies: 6
    Last Post: 04-14-2008, 05:24 AM
  2. Vector Operations in 3D Graphics
    By The Dog in forum Game Programming
    Replies: 27
    Last Post: 09-13-2005, 05:11 PM
  3. vector<bool> push_back
    By ygfperson in forum C++ Programming
    Replies: 4
    Last Post: 03-05-2003, 08:48 PM