# Thread: bool vector operations - a more efficient way needed

1. ## bool vector operations - a more efficient way needed

hi,

well i am in a need of a more efficient way to do the following:

I am interested into overlapping intervals, actually their start and end position. What I have is a set of intervals (2-7, 4-19, 21-43 ect.) some of them overlap each other, some are embedded one into each other and some are separate. the number of intervals is large (high number of them is repeated and so on - basically what I am implying is that no pre-sorting is going to help) and the maximum an minimum interval borders tend to be between 1 and 100000000.

So I was thinking is vector<bool> - they are small (right ??) and my algorithm would go something like this:

Code:
```#include <vector>

using namespace std;

int main(){

for(different sets of borders){
// for each set of borders which are stored as simple vectors having pairs of 2 defining start and stop of the interval (vector<int> borders = {2,7,4,19,21,43 ...}), do the following:

vector<bool> vec(max,false) // max is the maximum value in borders vector
for(int i = 0; i< borders.size(); i+=2 )
for(int j = borders[i]; j<borders[i+1]; j++ )
vec[j]=true;

for(auto it = vec.begin(); it != vec.end(); it++){
//print start and stop positions of true intervals
}
}
return 0;
}```
is there a faster / more elegant way to do this ?

and yes memory-wise my borders in the above example outstrip my bool vector this is because in reality the intervals are received on-line and there in no actual border vector- that is also why sorting is not a solution.

Any suggestions? 2. Could you give an example input and output, I think that makes it much easier to understand. I think I understand your input, but I'm not sure what exactly it is you are trying to determine.

So I was thinking is vector<bool> - they are small (right ??)
Optimized for space (if that's your priority), yes. 3. a small example with the static input:
Code:
```#include <vector>
#include <iostream>

using namespace std;

int main(){

vector<bool> vec(1000,false);
vector <int> borders;
borders.push_back(23);
borders.push_back(43);
borders.push_back(63);
borders.push_back(71);
borders.push_back(21);
borders.push_back(29);
borders.push_back(3);
borders.push_back(19);
borders.push_back(221);
borders.push_back(233);

for(int i = 0; i< 10; i+=2 )
for(int j = borders[i]; j<borders[i+1]; j++ )
vec[j]=true;
int i=0;
bool x = false;
for(auto it = vec.begin(); it != vec.end(); it++){

if((*it == true && x==false) || (*it == false && x==true)){
cout << i ;
if(x==true){
x=false;
cout << ",";
}else{
cout << "-";
x=true;
}
}
i++;
}
cout << endl;
return 0;
}```
as a result it prints out:
3-19,21-43,63-71,221-233

which corresponds to intervals of true vector values. Note that 21-43 is an interval composed of 21-29 and 23-43 4. Thanks, I understand the goal now. Start/end pairs are the desired end result; the vector<bool> is just for intermediate computation, yes? And new input pairs arrive while the program is running, correct? I'll try to come up with something different tomorrow and reply here. 5. I thought about it and given the constraints you gave, your solution looks pretty good. So if a pair arrives, simply write its range into the bool vector. If you can spare the memory, you could try vector<char> with 0/1 to see if that's faster. vector<bool> is a special implementation that might not be as fast, even though it uses a lot less space.

Depending on how long this program receives input (you said a lot of the ranges might repeat) you could try going with a sorted vector still:
Code:
`vector<pair<int, int>> ranges;`
1. Have vector sorted by first element ("start").
2. When new pair arrives, do std::lower_bound on ranges, comparing against first element.
3. Then compare the returned iterator in detail to your new pair.
4. A1) If there is overlap and the new pair isn't fully inside the range, extend range's start and/or end values.
5. After extending it, compare iterator to the element before and after it to ensure they don't overlap now. If they do, "merge" them into a single element. (get rid of superfluous elements by swapping them with end of vector, then resizing)
6. A2) If there is overlap but the range fully contains the new pair, do nothing.
7. B) If there is no overlap with present ranges, push the new pair onto the ranges vector and sort the vector again by first element.

Sounds a lot more complicated than just writing incoming ranges into the bool/char vector, doesn't it? It might not be faster either. A disproportionately high occurrence of A2 cases might justify trying it. The only clear advantage, assuming the logic is sound, would be much smaller memory consumption. You'd only keep the actual start-end pairs in memory, not every possible element from 0 to 100 million or whatever your possible range is.

Someone with actual knowledge of algorithms can probably give you something way better, like a tree structure to quickly query and update ranges. 6. ##  Originally Posted by Guest I thought about it and given the constraints you gave, your solution looks pretty good. So if a pair arrives, simply write its range into the bool vector. If you can spare the memory, you could try vector<char> with 0/1 to see if that's faster. vector<bool> is a special implementation that might not be as fast, even though it uses a lot less space.

Depending on how long this program receives input (you said a lot of the ranges might repeat) you could try going with a sorted vector still:
Code:
`vector<pair<int, int>> ranges;`
1. Have vector sorted by first element ("start").
2. When new pair arrives, do std::lower_bound on ranges, comparing against first element.
3. Then compare the returned iterator in detail to your new pair.
4. A1) If there is overlap and the new pair isn't fully inside the range, extend range's start and/or end values.
5. After extending it, compare iterator to the element before and after it to ensure they don't overlap now. If they do, "merge" them into a single element. (get rid of superfluous elements by swapping them with end of vector, then resizing)
6. A2) If there is overlap but the range fully contains the new pair, do nothing.
7. B) If there is no overlap with present ranges, push the new pair onto the ranges vector and sort the vector again by first element.

Sounds a lot more complicated than just writing incoming ranges into the bool/char vector, doesn't it? It might not be faster either. A disproportionately high occurrence of A2 cases might justify trying it. The only clear advantage, assuming the logic is sound, would be much smaller memory consumption. You'd only keep the actual start-end pairs in memory, not every possible element from 0 to 100 million or whatever your possible range is.

Someone with actual knowledge of algorithms can probably give you something way better, like a tree structure to quickly query and update ranges.

Thank you 7. The best way to improve the algorithm might be to implement it with bitset. 8. Although now that I think about it, there is a way to do it without using a bunch of bits. Sorting can work. If you have all the intervals, than sort every pair of end points in increasing order. That is, (1, 2) comes before (1, 5) which comes before (2, 4) or what have you. std::pair already works with this ordering.

From there all you have to do is check the ways that intervals can overlap, except we know that there is no interval that comes before the ones being looked at.

Code:
```#include <algorithm>
#include <vector>
#include <iostream>
#include <utility>

using namespace std;
typedef pair<int, int> Interval;

void Merge(vector<Interval>& ranges)
{
sort(ranges.begin(), ranges.end());
for (size_t i = 0; i < ranges.size() - 1; )
{
size_t j = i + 1;
int a = ranges[i].first, b = ranges[i].second;
int x = ranges[j].first, y = ranges[j].second;
if (b - x > 0) // b is in x - y, so the range could be expressed as a - y
{
ranges[i] = Interval(a, y);
ranges.erase(ranges.begin() + j);
}
else if (x < a && y > b) // a - b lies between x - y
{
ranges.erase(ranges.begin() + i);
}
else i++;
}
}```
You would have to measure the performance of the two algorithms to see if one is faster. I just know this requires no extra space. Popular pages Recent additions 