how to compute set intersection efficiently?

**George2** · 06-01-2006

Hello everyone,

I need to compute the intersection of two (or more) sets. I will use C or C++, but STL algorithms are not enabled to be used. I have Googled, but it seems that there are only union algorithms and no intersection algorithms.

It is not my homework, and it is my interest to improve the application I am working on now. Does anyone have any reference materials or any ideas of how to design the intersection operation of sets efficiently.

thanks in advance,
George

**Salem** · 06-01-2006

Well if sets were bits, then

a & b is intersection
a | b is union

**George2** · 06-02-2006

Hi Salem,

Originally Posted by Salem

Well if sets were bits, then

a & b is intersection
a | b is union

It is composed of integers. Any ideas?

regards,
George

**Salem** · 06-02-2006

> It is composed of integers. Any ideas?
How many integers
What range of integers

What are your ideas?

> how to design the intersection operation of sets efficiently.
Try not to engage in premature optimisation.
If your design and interfaces are clean, then rewriting bits of code AFTER you've done a profile run on the completed project will tell you what you really need to be concentrating on.

**itsme86** · 06-02-2006

If the sets are sorted you'll be able to optomize your intersection algorithm pretty easily. Of course, that means a performance hit when adding numbers to your sets.

But if your sets are sorted you can use binary searches for the intersection (and for insertions) which are pretty speedy. Binary searches also scale really well.

Here's a start I came up with:

Code:

typedef struct
{
  int *set;
  size_t size;
} set_t;

size_t binsearch(int *haystack, int needle, size_t size)
{
  size_t half = size / 2;
  size_t res;

  if(!size)
    return (size_t)-1;

  if(needle == haystack[half])
    return half;

  if(size < 2)
    return (size_t)-1;

  if(needle < haystack[half])
    return binsearch(haystack, needle, half);

  res = binsearch(haystack + half, needle, size - half);

  return res == (size_t)-1 ? res : res + half;
}

size_t insertionpoint(int *haystack, int needle, size_t size)
{
  size_t half = size / 2;

  if(size < 2)
    return haystack[0] < needle;

  if(haystack[half] > needle)
    return insertionpoint(haystack, needle, half);
  return insertionpoint(haystack + half, needle, size - half) + half;
}

set_t *set_init(void)
{
  set_t *set;

  if(!(set = malloc(sizeof(set_t))))
    return NULL;

  set->set = NULL;
  set->size = 0;

  return set;
}

void set_free(set_t *set)
{
  if(set->set)
    free(set->set);
  free(set);
}

size_t set_insert(set_t *set, int num)
{
  size_t i;

  if(binsearch(set->set, num, set->size) == (size_t)-1)
  {
    int *temp;

    if(!(temp = realloc(set->set, sizeof(int) * (set->size + 1))))
      return 0;

    i = set->size ? insertionpoint(temp, num, set->size) : 0;

    memmove(temp + i + 1, temp + i, sizeof(int) * (set->size - i));
    temp[i] = num;

    set->set = temp;
    set->size++;
  }

  return set->size;
}

set_t *set_intersect(set_t *set1, set_t *set2)
{
  set_t *new;
  size_t i;

  if(!(new = set_init()))
    return NULL;

  for(i = 0;i < set1->size;++i)
    if(binsearch(set2->set, set1->set[(int)i], set2->size) != (size_t)-1)
      if(!(set_insert(new, set1->set[(int)i])))
      {
        set_free(new);
        return NULL;
      }

  return new;
}

I'm sure there's still lots of room for improvement.

Thread: how to compute set intersection efficiently?

Thread Tools

Search Thread

Display

how to compute set intersection efficiently?

Similar Threads

my code to prove working set larger than virtual bytes

Still confused why working set larger than virtual memory

need help debugging a program counting bits set from an integer number

6 measly errors

My graphics library