Thread: Finding duplicates and merging

  1. #1
    Registered User
    Join Date
    Apr 2004
    Posts
    22

    Finding duplicates and merging

    Hello All,

    I am working on two algorithms and your help will eb appreciated.

    1. I have an array of any length( 10 to 10000 element), and some items are duplicated. Need to write algo that removes the duplcated items and keeps only one item whose index is less.

    2. Two arrau of any lentgh and some lements are common. Need to be merged by removing duplicated item.

    I am working on these algo and will be posting the code soon. ANy help before that os appreciated.

    Thanks,
    Happy Man for (;

  2. #2
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    The first solution that comes to mind:

    1) Sort and copy only the first of any duplicated elements. The sort step makes this part trivial.

    2) Copy the two arrays into one, sort, then perform task 1.
    My best code is written with the delete key.

  3. #3
    Registered User
    Join Date
    Apr 2004
    Posts
    22
    For me sorting is not necessary or rather it is not part of requirment. Actually, duplicates need to be removed but order of elements should be kept same.

    e.g. arr[] = {4,6,3,4,2,1,6,9};
    After execution
    arr[] = {4,6,3,,2,1,9};

    Any inputs????

  4. #4
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >duplicates need to be removed but order of elements should be kept same
    Then a frequency table would be a good option:
    Code:
    #include <stdio.h>
    
    int main ( void )
    {
      int arr[] = {4,6,3,4,2,1,6,9};
      int save[8];
      int dup[10] = {0};
      int i, j = 0;
    
      for ( i = 0; i < 8; i++ ) {
        if ( dup[ arr[i] ] == 0 ) {
          save[j++] = arr[i];
          dup[ arr[i] ] = 1;
        }
      }
      for ( i = 0; i < j; i++ ) {
        printf ( "%d ", save[i] );
      }
      printf ( "\n" );
    
      return 0;
    }
    You would need to modify the above to scale well to the quantities for your problem. For example, a binary search tree would more easily handle large quantities of values in a broad range than a table, but you lose the benefit of constant time searching.
    My best code is written with the delete key.

  5. #5
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Its really not possible to "remove" an object out of an array, since that spot will have to have *some* value. What you can do is to create another array to put the results into. I'll leave it up to you to improve upon this code
    Code:
    #include <stdio.h>
    
    int main(void)
    {
      int arr[]= {4,6,3,4,2,1,6,9};
      int dest[sizeof arr/sizeof arr[0]];
      int count, count2, size, open=0;
      size = sizeof arr/sizeof arr[0];
      printf("size: %d\n", size);
    
      for (count=0; count < size; count++)
        for (count2=0; count2<size; count2++)
          if ( arr[count] == arr[count2] && count2 < count)
          {
            open++;
            break;
          }
          else
          {
            dest[count-open] = arr[count];
          }
      printf("Open: %d\n", open);
      for (count=0; count<size-open; count++)
      {
        printf("%d\n", dest[count]);
      }
    
      return 0;
    }

  6. #6
    Registered User
    Join Date
    Apr 2004
    Posts
    22
    Thantos,

    Is there any way we can eliminate use of second array ?

    Thanks,
    Happy Man

  7. #7
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >Is there any way we can eliminate use of second array ?
    Yes, but it wouldn't be wise when your array will potentially have up to 10,000 elements. The cost of shifting elements to repair holes would be far too great if there are a lot of duplicates anywhere except the end of the array.
    My best code is written with the delete key.

  8. #8
    Registered User linuxdude's Avatar
    Join Date
    Mar 2003
    Location
    Louisiana
    Posts
    926
    Prelude(or anyone knowledgeable on the subject) could you explain frequency tables. It is odd to me. And array with an array for an element(but I know that that is not what it is)?

  9. #9
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >Prelude(or anyone knowledgeable on the subject) could you explain frequency tables.
    Imagine an array that is big enough to hold any possible value in a set (such as 0-9). The value itself is the index, and the contents of the frequency table would be a count of how many values matching that index were found. So a frequency table of the set {4,6,3,4,2,1,6,9} would be:
    Code:
    freq[0] = 0
    freq[1] = 1
    freq[2] = 1
    freq[3] = 1
    freq[4] = 2
    freq[5] = 0
    freq[6] = 2
    freq[7] = 0
    freq[8] = 0
    freq[9] = 1
    That's all it is at its simplest, an array telling you how many of each of the values are in the set. It gets harder when you know that the range of values will be large or sparse. Then you'll need to resort to a data structure better suited to holding unique items and marking the frequency without being too wasteful (such as a binary search tree).
    My best code is written with the delete key.

  10. #10
    Registered User linuxdude's Avatar
    Join Date
    Mar 2003
    Location
    Louisiana
    Posts
    926
    Code:
        if ( dup[ arr[i] ] == 0 ) {
          save[j++] = arr[i];
          dup[ arr[i] ] = 1;
    I have having trouble with the first and 3rd lines. dup[arr[i]] is that the same as dup[the value of arr[i]]?? Thanx for your patience, but this is pretty interesting

  11. #11
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >dup[arr[i]] is that the same as dup[the value of arr[i]]??
    Yes. If the value at arr[i] is 5 then the code would be the same as saying dup[5].
    My best code is written with the delete key.

  12. #12
    Registered User
    Join Date
    Apr 2004
    Posts
    22
    Like algorthms for sorting, are there any standard algorithms for finding duplicates in array? Or can we use same sorting algorithms to find duplicates? Instead checking > or < check for ==.

    Please let me know

  13. #13
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >are there any standard algorithms for finding duplicates in array?
    The "obvious" and most common algorithm is to walk over the array in an inner loop testing for equivalence with the current item in the outer loop:
    Code:
    for ( i = 0; i < n; i++ ) {
      for ( j = 0; j < n; j++ ) {
        if ( a[i] == a[j] ) {
          /* Handle duplicate */
        }
      }
    }
    Naturally, this is dreadfully inefficient for large arrays.
    My best code is written with the delete key.

  14. #14
    C++ Developer XSquared's Avatar
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    2,718
    >Naturally, this is dreadfully inefficient for large arrays.
    Do you enjoy leaving us haning? I'd really like to know what a more efficient method is for larger arrays... come on... please??? *sad puppy eyes*
    Naturally I didn't feel inspired enough to read all the links for you, since I already slaved away for long hours under a blistering sun pressing the search button after typing four whole words! - Quzah

    You. Fetch me my copy of the Wall Street Journal. You two, fight to the death - Stewie

  15. #15
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >I'd really like to know what a more efficient method is for larger arrays
    See my previous posts then. My first suggestion was O(n log n), my second was O(n), and my improvement of the second for larger ranges was O(log n). All of them are substantially better than the O(n^2) nested loop solution.
    My best code is written with the delete key.

Popular pages Recent additions subscribe to a feed