importance of distinguishing 'deleted' vs 'empty' indices via open addressing

**monkey_c_monkey** · 07-09-2012

Why is it so important to mark the difference? Why can't an item found just have its index marked as empty. For my trivial case, I'm thinking of initialize an array w/ empty signified by the value 0.

so:

Code:

int list[10] = {0,0,0,0,0,0,0,0,0,0};

So the function for removing an item could be:

Code:

int open_address_remove(int* list, int size; int key, int data)
{
  for ( int i = 0; i < size; ++i )
  {
    int cur_bucket = hash(key, i);

    if ( list[cur_bucket] == data )
    {
       list[cur_bucket] = 0;//Why do books, online tutorials make it so important to distinguish b/t empty and deleted indices?
      return cur_bucket;//return where the item to removed was found
     }
  }

  return -1;//return -1 to represent item-to-remove not found
}

Here is one very helpful link I used, it says to indicate a stop condition is when position == hash(key) but for my code above, coming full circle (so ensuriing all indices checked) is by the counter i so why this importance as this tutorial makes.
http://webdocs.cs.ualberta.ca/~holte/T26/open-addr.html

What am I not getting here? Appreciate all the help I can get.

**laserlight** · 07-09-2012

Whether it is important to distinguish between an object in some kind of "empty" state and some kind of "deleted" state depends on what you are doing. In fact, in your array of 10 elements, all the elements exist. It is merely by some convention or designation that you regard some elements as "empty" or "deleted" and others as "in use".

**King Mir** · 07-10-2012

In the context of that lecture, I think the distinction is necessary because of the implementation of the lookup function. To look up a value at a specific key, you need to check hash(key,0), then all subsequent keys until an empty one is found or the . Deleted key need to be skipped over, but do not mean that the element is not found.

As an optimisation, you could mark list[cur_bucket] as empty when list[cur_bucket+1] is empty.

EDIT 3:
At the bottom of the link you posted:

Suppose now we DELETE 35. It is hashed to position 5, which is occupied, and, using linear probing, we look next in position 6, where we find what we are looking for. Position 6 is marked as DELETED, not EMPTY.

To see the difference, consider now what will happen if we try to retrieve 25. When 25 was inserted, its probing sequence took it to position 6. The same will happen now, during retrieval. When 25 was inserted, position 6 was occupied by 35, so 25's probing sequence continued past position 6, ultimately ending at position 9. We must arrange for the same probing sequence to occur now, even though some of the positions in the original probing sequence are no longer occupied. That is why we must distinguish between DELETED positions and EMPTY ones. In our search for 25 we must not stop at position 6; therefore we must not mark it as EMPTY.

**iMalc** · 07-10-2012

Checking every bucket upon remove is slow compared to only checking from the position the item hashes to up to the first empty bucket. But when you do that, you need to know the difference.
Think about the behaviour when you add items with a duplicate hash and then remove those items in a different order.

**monkey_c_monkey** · 07-26-2012

I still don't see it. let's say my table (array) is size 10, and the keys I want to insert are: 3,13,23,33,43.

so if I use linear probing, it's inserted as follows:

1)hash(3) = 3 % size so insert @ index 3 so current table:

Code:

int table[10] = {0,0,0,3,0,0,0,0,0,0}

2)hash(13) = 13 % size so insert @ index 3, BUT it's occupied so insert @ (3 + 1)%size so index 4:

Code:

int table[10] = {0,0,0,3,13,0,0,0,0,0}

3)hash(23) = 23 % size so insert @ index 3, BUT it's occupied so insert @ (3 + 1) % size BUT also taken, so (3 + 2 ) % size so index 5:

Code:

int table[10] =  {0,0,0,3,13,23,0,0,0,0}

then hash(33), hash(43) so

Code:

int table[10] = {0,0,0,3,13,23,33,43,0,0}

Now to remove 23:
==============
1) hash(33) to jump to where key 33 should be: so 33 % size = 3, not there, then check (3 + 1)%size so index 4, not there etc until we find it @ index 6. So why not just set the value to 0 which means 'empty' rather than have to disgtinguish w/ a marker (let's say -1 to imply a deleted index)? Let's say I use -1 for deleted, rather than 0 (to indicate empty slot) so:

Code:

int table[10] =  {0,0,0,3,13,23,-1,43,0,0}

Now search 33:
===========
1) hash(33) to jump to where key 33 should be: so 33 % size = 3, not there, etc until we hit index 6: -1 (marker to mean: 'deleted'), so we conclude key is NOT in list rather than having to traverse entire table (which would be O(n) time which is not good), but how would can we just assume that just b/c we hit a deleted index (marker of -1 @ index 6 in this case), that the key wasn't in list. Index 6 could've stored a key of 83. That's why I am a little confused. I really don't see how removal of a key makes it so detrimental to the data structure. All sources (online and books) all point out importance to disgtinguish b/t 'delete' vs 'empty' (so what the indices are initially).

Here is the quote from Eternally confuzzled:

Removal in any open addressing strategy is where things get interesting. A direct removal is not possible because future probes could rely on the key being removed, and if an empty bucket is created where a full bucket once was, searches will incorrectly fail. All in all, the obvious way to remove from a hash table with open addressing will destroy the data structure and cause it to work improperly. With linear probing, it is possible to remove a key and then re-insert each of the keys in the same cluster, but that solution is somewhat complicated.
A more general solution that works for all of the open addressing schemes is to mark a bucket as deleted in such a way so that searches will not fail.

here is the source:Eternally Confuzzled - Hash Table Tutorial

Appreciate all the help to see what I am missing here. If someone can point out what I am not getting, that would make my day.

**smokeyangel** · 07-26-2012

You have it the wrong way round!

Finding a DELETED entry means you have to keep looking. Finding an EMPTY entry means that the element is definitely not there and you can stop.

Originally Posted by monkey_c_monkey

Code:

int table[10] =  {0,0,0,3,13,23,-1,43,0,0}

Now search 33:
===========
1) hash(33) to jump to where key 33 should be: so 33 % size = 3, not there, etc until we hit index 6: -1 (marker to mean: 'deleted'), so we conclude key is NOT in list rather than having to traverse entire table (which would be O(n) time which is not good), but how would can we just assume that just b/c we hit a deleted index (marker of -1 @ index 6 in this case), that the key wasn't in list. Index 6 could've stored a key of 83. That's why I am a little confused. I really don't see how removal of a key makes it so detrimental to the data structure. All sources (online and books) all point out importance to disgtinguish b/t 'delete' vs 'empty' (so what the indices are initially).

This would be:
3: occupied, no match
4: occupied, no match
5: occupied, no match
6: deleted (keep going)
7: occupied, no matrch
8: empty. Stop.

We can stop on empty because the 33 would have been stored there if it existed at all. Suppose you just used 0 for deleted:

Code:

int table[10] =  {0,0,0,3,13,23,0,43,0,0}

If you searched for 43, you'd stop at 6 if were considering empty to mean you can stop. Without marking things as deleted you'd have to look at every element (O(n), as you say).

**Elysia** · 07-26-2012

If you use 0 as a marker for empty, then you can't store the value 0 in your hash table (how are you supposed to distinguish between an inserted 0 and an empty element?).

Thread: importance of distinguishing 'deleted' vs 'empty' indices via open addressing

Thread Tools

Search Thread

Display

importance of distinguishing 'deleted' vs 'empty' indices via open addressing

Similar Threads

Resize array values of "empty" indices

distinguishing executable files

open empty file

can any body help in distinguishing C++ and VC++

Distinguishing types