# Thread: Two int Arrays, remove duplicates from second

1. ## Two int Arrays, remove duplicates from second

Code:
```/*
generals is the first array. Max 10 elements.
numGenerals is the element count of generals.

genBuff is the second array; it is to be checked/pruned.
genCount is the element count of genBuff.
genBuff will be a max of 171, but be pruned to no more than 10, and no more than the complement of the element count of generals.
*/

for(int i = 0; i < numGenerals; i++)
for(int j = 0; j < genCount; j++)
if(generals[i] == genBuff[j])
{
for(int k = j; k < genCount - 1; k++)
genBuff[k] = genBuff[k + 1];
genCount--;
}```
Is there a better way to do this that is still easy for someone to read and understand? (I do have comments in the actual source, different from above).

I have two int arrays. They hold values from 0 to 170. The first one will never be more than 10. The second will be at most 171, but will be whittled down to at most 10, usually less. 171 is worst case, most users of this particular program will probably be reasonable and not try to add all 171 (max is 10 anyway). The first array is the original array. The second array is a temporary array. Any value in the second array that is also found in the first array, is removed from the second array, since all values in the first one must be unique. After this pruning process, both arrays will collectively contain no more than 10 unique elements; the elements from the second will be added to the first.

So right now I have three nested loops. I figured with the miniscule array sizes it wouldn't be a big deal. I can think of a way to remove one or two of them, but I want to be sure that I'm still writing clean, legible, good-practice code. The first loop walks through the first array. For each element in the first array, there is a second loop to walk through the second array to check for duplicates. If a duplicate is found, the third loop walks through the second array to overwrite the duplicate while preserving the second loop's position (j).

Is this dumb? I know that the big O gets worse and worse the deeper you go with nested loops. Even though the arrays are really tiny, is this still a thing to avoid?

2. You could sort both arrays, then a single parallel pass through both arrays would be enough to find the elements from the second array that are in the first array. Or since the first array will always be tiny, sorting the second array will suffice, upon which you can use binary search to find the elements from the second array that match those in the first array. However, "removing" a duplicate from the second array by shifting elements will take linear time, i.e., the same time complexity as finding if a given element from the first array is in the second array without sorting the second array. Therefore, this sorting will not be all that useful.

Originally Posted by Sorin
I figured with the miniscule array sizes it wouldn't be a big deal.
Probably.

3. Your function doesn't work. After shifting the elements back you need to decrement j as well. Otherwise you will not detect if the next element is a duplicate as well.
Kurt

4. You don't need to continuously shift the elements back when you find one in the array. All you really need to do is one pass through the array like so:

Code:
```size_t rm(int *a, size_t n, int v)
{
size_t to, from;

for (to = from = 0; from < n; to += a[from++] != v)
a[to] = a[from];
}```
I suggest you break your code up into fewer smaller functions that each do a single thing. You'll find it easier to manage that way.

5. I guess there's a few things I should have mentioned that I took for granted.

Originally Posted by laserlight
You could sort both arrays, then a single parallel pass through both arrays would be enough to find the elements from the second array that are in the first array. Or since the first array will always be tiny, sorting the second array will suffice, upon which you can use binary search to find the elements from the second array that match those in the first array. However, "removing" a duplicate from the second array by shifting elements will take linear time, i.e., the same time complexity as finding if a given element from the first array is in the second array without sorting the second array. Therefore, this sorting will not be all that useful.
First is that the arrays are both already sorted. The first array is built from a binary file, and the elements are always present in the file in sorted order (lowest to highest).

Originally Posted by ZuK
Your function doesn't work. After shifting the elements back you need to decrement j as well. Otherwise you will not detect if the next element is a duplicate as well.
Kurt
Second, is that the second array will never contain duplicates in itself. The array is generated by Windows through a listbox with the multi-selection style, so the array will always contain unique elements within itself since it is just a list of selected indexes. Plus, they're sorted.

BUT, I do see your point in that if I was working with an array where duplicates were possible within itself, I'd need to decrement j after moving everything over in order to not skip over elements, so thanks for pointing that out.

Thanks to both of you for the pointers, I appreciate the input.

Originally Posted by Barney McGrew
You don't need to continuously shift the elements back when you find one in the array. All you really need to do is one pass through the array like so:

Code:
```size_t rm(int *a, size_t n, int v)
{
size_t to, from;

for (to = from = 0; from < n; to += a[from++] != v)
a[to] = a[from];
}```
I suggest you break your code up into fewer smaller functions that each do a single thing. You'll find it easier to manage that way.
I'll have to decipher what's going on here in the morning. I assume this is still considered good code, and that I just need more practice reading other peoples code.

6. Barney's code can be rewritten so it is easier to understand.

Code:
```for ( from = to = 0; from < n; from++ ) {
a[to] = a[from];
if ( a[from] != v ) to++;
}```
If it makes any difference to you this code will result in as many assignments as there are duplicates. So in the array (1, 2, 2, 2, 3) the two in the second position is actually assigned three times, once for each 2, and then the to counter gets incremented once to properly place the 3, e.g. (1, 2, 3).