I thought about it and given the constraints you gave, your solution looks pretty good. So if a pair arrives, simply write its range into the bool vector. If you can spare the memory, you could try
vector<char> with 0/1 to see if that's faster. vector<bool> is a special implementation that might not be as fast, even though it uses a lot less space.
Depending on how long this program receives input (you said a lot of the ranges might repeat) you could try going with a sorted vector still:
Code:
vector<pair<int, int>> ranges;
- Have vector sorted by first element ("start").
- When new pair arrives, do std::lower_bound on ranges, comparing against first element.
- Then compare the returned iterator in detail to your new pair.
- A1) If there is overlap and the new pair isn't fully inside the range, extend range's start and/or end values.
- After extending it, compare iterator to the element before and after it to ensure they don't overlap now. If they do, "merge" them into a single element. (get rid of superfluous elements by swapping them with end of vector, then resizing)
- A2) If there is overlap but the range fully contains the new pair, do nothing.
- B) If there is no overlap with present ranges, push the new pair onto the ranges vector and sort the vector again by first element.
Sounds a lot more complicated than just writing incoming ranges into the bool/char vector, doesn't it? It might not be faster either. A disproportionately high occurrence of
A2 cases might justify trying it. The only clear advantage, assuming the logic is sound, would be much smaller memory consumption. You'd only keep the actual start-end pairs in memory, not every possible element from 0 to 100 million or whatever your possible range is.
Someone with actual knowledge of algorithms can probably give you something way better, like a tree structure to quickly query and update ranges.