Thread: how to compute set intersection efficiently?

  1. #1
    Registered User
    Join Date
    May 2006
    Posts
    1,579

    how to compute set intersection efficiently?

    Hello everyone,


    I need to compute the intersection of two (or more) sets. I will use C or C++, but STL algorithms are not enabled to be used. I have Googled, but it seems that there are only union algorithms and no intersection algorithms.

    It is not my homework, and it is my interest to improve the application I am working on now. Does anyone have any reference materials or any ideas of how to design the intersection operation of sets efficiently.


    thanks in advance,
    George

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well if sets were bits, then

    a & b is intersection
    a | b is union
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    May 2006
    Posts
    1,579
    Hi Salem,


    Quote Originally Posted by Salem
    Well if sets were bits, then

    a & b is intersection
    a | b is union
    It is composed of integers. Any ideas?


    regards,
    George

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > It is composed of integers. Any ideas?
    How many integers
    What range of integers

    What are your ideas?

    > how to design the intersection operation of sets efficiently.
    Try not to engage in premature optimisation.
    If your design and interfaces are clean, then rewriting bits of code AFTER you've done a profile run on the completed project will tell you what you really need to be concentrating on.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    If the sets are sorted you'll be able to optomize your intersection algorithm pretty easily. Of course, that means a performance hit when adding numbers to your sets.

    But if your sets are sorted you can use binary searches for the intersection (and for insertions) which are pretty speedy. Binary searches also scale really well.

    Here's a start I came up with:
    Code:
    typedef struct
    {
      int *set;
      size_t size;
    } set_t;
    
    size_t binsearch(int *haystack, int needle, size_t size)
    {
      size_t half = size / 2;
      size_t res;
    
      if(!size)
        return (size_t)-1;
    
      if(needle == haystack[half])
        return half;
    
      if(size < 2)
        return (size_t)-1;
    
      if(needle < haystack[half])
        return binsearch(haystack, needle, half);
    
      res = binsearch(haystack + half, needle, size - half);
    
      return res == (size_t)-1 ? res : res + half;
    }
    
    size_t insertionpoint(int *haystack, int needle, size_t size)
    {
      size_t half = size / 2;
    
      if(size < 2)
        return haystack[0] < needle;
    
      if(haystack[half] > needle)
        return insertionpoint(haystack, needle, half);
      return insertionpoint(haystack + half, needle, size - half) + half;
    }
    
    set_t *set_init(void)
    {
      set_t *set;
    
      if(!(set = malloc(sizeof(set_t))))
        return NULL;
    
      set->set = NULL;
      set->size = 0;
    
      return set;
    }
    
    void set_free(set_t *set)
    {
      if(set->set)
        free(set->set);
      free(set);
    }
    
    size_t set_insert(set_t *set, int num)
    {
      size_t i;
    
      if(binsearch(set->set, num, set->size) == (size_t)-1)
      {
        int *temp;
    
        if(!(temp = realloc(set->set, sizeof(int) * (set->size + 1))))
          return 0;
    
        i = set->size ? insertionpoint(temp, num, set->size) : 0;
    
        memmove(temp + i + 1, temp + i, sizeof(int) * (set->size - i));
        temp[i] = num;
    
        set->set = temp;
        set->size++;
      }
    
      return set->size;
    }
    
    set_t *set_intersect(set_t *set1, set_t *set2)
    {
      set_t *new;
      size_t i;
    
      if(!(new = set_init()))
        return NULL;
    
      for(i = 0;i < set1->size;++i)
        if(binsearch(set2->set, set1->set[(int)i], set2->size) != (size_t)-1)
          if(!(set_insert(new, set1->set[(int)i])))
          {
            set_free(new);
            return NULL;
          }
    
      return new;
    }
    I'm sure there's still lots of room for improvement.
    Last edited by itsme86; 06-02-2006 at 12:43 PM.
    If you understand what you're doing, you're not learning anything.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 01-18-2008, 04:06 AM
  2. Replies: 4
    Last Post: 01-13-2008, 02:14 AM
  3. Replies: 7
    Last Post: 08-19-2007, 08:10 AM
  4. 6 measly errors
    By beene in forum Game Programming
    Replies: 11
    Last Post: 11-14-2006, 11:06 AM
  5. My graphics library
    By stupid_mutt in forum C Programming
    Replies: 3
    Last Post: 11-26-2001, 06:05 PM