Thread: importance of distinguishing 'deleted' vs 'empty' indices via open addressing

  1. #1
    Registered User
    Join Date
    Jul 2012
    Posts
    87

    importance of distinguishing 'deleted' vs 'empty' indices via open addressing

    Why is it so important to mark the difference? Why can't an item found just have its index marked as empty. For my trivial case, I'm thinking of initialize an array w/ empty signified by the value 0.

    so:
    Code:
    int list[10] = {0,0,0,0,0,0,0,0,0,0};
    So the function for removing an item could be:
    Code:
    int open_address_remove(int* list, int size; int key, int data)
    {
      for ( int i = 0; i < size; ++i )
      {
        int cur_bucket = hash(key, i);
    
        if ( list[cur_bucket] == data )
        {
           list[cur_bucket] = 0;//Why do books, online tutorials make it so important to distinguish b/t empty and deleted indices?
          return cur_bucket;//return where the item to removed was found
         }
      }
    
      return -1;//return -1 to represent item-to-remove not found
    }
    Here is one very helpful link I used, it says to indicate a stop condition is when position == hash(key) but for my code above, coming full circle (so ensuriing all indices checked) is by the counter i so why this importance as this tutorial makes.
    http://webdocs.cs.ualberta.ca/~holte/T26/open-addr.html

    What am I not getting here? Appreciate all the help I can get.

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Whether it is important to distinguish between an object in some kind of "empty" state and some kind of "deleted" state depends on what you are doing. In fact, in your array of 10 elements, all the elements exist. It is merely by some convention or designation that you regard some elements as "empty" or "deleted" and others as "in use".
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    In the context of that lecture, I think the distinction is necessary because of the implementation of the lookup function. To look up a value at a specific key, you need to check hash(key,0), then all subsequent keys until an empty one is found or the . Deleted key need to be skipped over, but do not mean that the element is not found.

    As an optimisation, you could mark list[cur_bucket] as empty when list[cur_bucket+1] is empty.

    EDIT 3:
    At the bottom of the link you posted:
    Suppose now we DELETE 35. It is hashed to position 5, which is occupied, and, using linear probing, we look next in position 6, where we find what we are looking for. Position 6 is marked as DELETED, not EMPTY.

    To see the difference, consider now what will happen if we try to retrieve 25. When 25 was inserted, its probing sequence took it to position 6. The same will happen now, during retrieval. When 25 was inserted, position 6 was occupied by 35, so 25's probing sequence continued past position 6, ultimately ending at position 9. We must arrange for the same probing sequence to occur now, even though some of the positions in the original probing sequence are no longer occupied. That is why we must distinguish between DELETED positions and EMPTY ones. In our search for 25 we must not stop at position 6; therefore we must not mark it as EMPTY.
    Last edited by King Mir; 07-10-2012 at 12:22 AM.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  4. #4
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Checking every bucket upon remove is slow compared to only checking from the position the item hashes to up to the first empty bucket. But when you do that, you need to know the difference.
    Think about the behaviour when you add items with a duplicate hash and then remove those items in a different order.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  5. #5
    Registered User
    Join Date
    Jul 2012
    Posts
    87
    I still don't see it. let's say my table (array) is size 10, and the keys I want to insert are: 3,13,23,33,43.

    so if I use linear probing, it's inserted as follows:

    1)hash(3) = 3 % size so insert @ index 3 so current table:
    Code:
    int table[10] = {0,0,0,3,0,0,0,0,0,0}
    2)hash(13) = 13 % size so insert @ index 3, BUT it's occupied so insert @ (3 + 1)%size so index 4:
    Code:
    int table[10] = {0,0,0,3,13,0,0,0,0,0}
    3)hash(23) = 23 % size so insert @ index 3, BUT it's occupied so insert @ (3 + 1) % size BUT also taken, so (3 + 2 ) % size so index 5:
    Code:
    int table[10] =  {0,0,0,3,13,23,0,0,0,0}
    then hash(33), hash(43) so
    Code:
    int table[10] = {0,0,0,3,13,23,33,43,0,0}
    Now to remove 23:
    ==============
    1) hash(33) to jump to where key 33 should be: so 33 % size = 3, not there, then check (3 + 1)%size so index 4, not there etc until we find it @ index 6. So why not just set the value to 0 which means 'empty' rather than have to disgtinguish w/ a marker (let's say -1 to imply a deleted index)? Let's say I use -1 for deleted, rather than 0 (to indicate empty slot) so:
    Code:
    int table[10] =  {0,0,0,3,13,23,-1,43,0,0}
    Now search 33:
    ===========
    1) hash(33) to jump to where key 33 should be: so 33 % size = 3, not there, etc until we hit index 6: -1 (marker to mean: 'deleted'), so we conclude key is NOT in list rather than having to traverse entire table (which would be O(n) time which is not good), but how would can we just assume that just b/c we hit a deleted index (marker of -1 @ index 6 in this case), that the key wasn't in list. Index 6 could've stored a key of 83. That's why I am a little confused. I really don't see how removal of a key makes it so detrimental to the data structure. All sources (online and books) all point out importance to disgtinguish b/t 'delete' vs 'empty' (so what the indices are initially).

    Here is the quote from Eternally confuzzled:
    Removal in any open addressing strategy is where things get interesting. A direct removal is not possible because future probes could rely on the key being removed, and if an empty bucket is created where a full bucket once was, searches will incorrectly fail. All in all, the obvious way to remove from a hash table with open addressing will destroy the data structure and cause it to work improperly. With linear probing, it is possible to remove a key and then re-insert each of the keys in the same cluster, but that solution is somewhat complicated.
    A more general solution that works for all of the open addressing schemes is to mark a bucket as deleted in such a way so that searches will not fail.
    here is the source:Eternally Confuzzled - Hash Table Tutorial

    Appreciate all the help to see what I am missing here. If someone can point out what I am not getting, that would make my day.

  6. #6
    Registered User
    Join Date
    Mar 2010
    Posts
    583
    You have it the wrong way round!

    Finding a DELETED entry means you have to keep looking. Finding an EMPTY entry means that the element is definitely not there and you can stop.

    Quote Originally Posted by monkey_c_monkey View Post
    Code:
    int table[10] =  {0,0,0,3,13,23,-1,43,0,0}
    Now search 33:
    ===========
    1) hash(33) to jump to where key 33 should be: so 33 % size = 3, not there, etc until we hit index 6: -1 (marker to mean: 'deleted'), so we conclude key is NOT in list rather than having to traverse entire table (which would be O(n) time which is not good), but how would can we just assume that just b/c we hit a deleted index (marker of -1 @ index 6 in this case), that the key wasn't in list. Index 6 could've stored a key of 83. That's why I am a little confused. I really don't see how removal of a key makes it so detrimental to the data structure. All sources (online and books) all point out importance to disgtinguish b/t 'delete' vs 'empty' (so what the indices are initially).
    This would be:
    3: occupied, no match
    4: occupied, no match
    5: occupied, no match
    6: deleted (keep going)
    7: occupied, no matrch
    8: empty. Stop.

    We can stop on empty because the 33 would have been stored there if it existed at all. Suppose you just used 0 for deleted:

    Code:
    int table[10] =  {0,0,0,3,13,23,0,43,0,0}
    If you searched for 43, you'd stop at 6 if were considering empty to mean you can stop. Without marking things as deleted you'd have to look at every element (O(n), as you say).

  7. #7
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    If you use 0 as a marker for empty, then you can't store the value 0 in your hash table (how are you supposed to distinguish between an inserted 0 and an empty element?).
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Resize array values of "empty" indices
    By monkey_c_monkey in forum C++ Programming
    Replies: 2
    Last Post: 07-09-2012, 01:22 PM
  2. distinguishing executable files
    By smooth in forum C Programming
    Replies: 8
    Last Post: 12-17-2008, 10:02 AM
  3. open empty file
    By mystic-d in forum C Programming
    Replies: 2
    Last Post: 11-16-2007, 10:27 AM
  4. can any body help in distinguishing C++ and VC++
    By bhagwat_maimt in forum C++ Programming
    Replies: 6
    Last Post: 11-15-2006, 04:03 AM
  5. Distinguishing types
    By tony_chestnut in forum C++ Programming
    Replies: 4
    Last Post: 10-03-2006, 03:58 PM