Thread: Hashing algo

  1. #1
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607

    Hashing algo

    I'm quite confused on using a hash algo to come up with an index into an array. Prelude said something in another thread about a classic AND,XOR and OR system that works but I've never heard of this.

    I essentially have a static array that is built using macros (think MFC's message map system) and I need a way to index into the array to find a handler for the message based on it's ID. Iterating the static array would take time and I'm not sure the game engine could do it fast enough to be useful.

    The MFC source code takes the LOWORD of the static array's address, XORs it with the message ID and blah blah blah somehow comes out with an index. I have no idea how taking an address of a static array and doing anything with it can get you an index into the array since the array address is dependent on the machine state and OS at run-time. Quite odd. And looking at the source for MFC more than gives me a headache.

    I will check out Prelude's site on hashing but thought I'd ask here anyways.

    Help Bubba plz.

  2. #2
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Basically you pick some arbitrary number, and throw in a bunch of XORs and ANDs, to give you some pretty much random, but repeatable number to use for your index. Let's see...
    Code:
    hashtable_t ourtable[ SOMESIZE ];
    ...
    index_t = index( thisentry, ourtable );
    ...
    
    index( entry_t *thisentry, hash_t *ourtable )
    {
        ...
        index_t  here = (ourtable XOR thisentry->somefield & SOMETHINGFUN) % SOMESIZE;
        add( thisentry, ourtable, here );
        ...
        return here;
    }
    At least that's what I usually do. I've never had any real need to have some super complex hash scheme. Just XOR, AND, and MOD it to fit. Then handle collisions.


    Quzah.
    Hope is the first step on the road to disappointment.

  3. #3
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    XOR AND and MOD eh.

    Hmm.

    Will give it a try and figure out why it works. Thanks Quzah. You're so kind.

  4. #4
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Ok more about hashing. I read Prelude's site and now I'm eternally confuzzled.

    Here is what we have. In our script system we wanted a way to let the user or programmer respond to messages w/o having to write a function for every single script command. Essentially if the handler has not been created, the message is not handled.

    We tore into the MFC code that does this, much to our chagrin, and figured out how they did it.
    The macro they use to declare the message map actually emits code into the classes for the static array that contains the message map information. The macros in the cpp file for the class emit code into the class and then begins a static array buildup. Essentially when you do an ON_COMMAND() or something else, those macros are just adding to an array. Kind of like this:

    Code:
    typedef void (*CDummyClass::MSG_MAP)(void);
    
    struct MESSAGE_MAP_ENTRY
    {
      UINT nMessageID;
      MSG_MAP pfn;
    };
    
     
    struct MESSAGE_MAP
    {
      MESSAGE_MAP_ENTRY *pEntries;
      MESSAGE_MAP *pBaseMap;
    };
    
    
    #define DECLARE_MESSAGE_MAP() \
    protected: \
      static const MESSAGE_MAP_ENTRY *m_pEntries; \
      static const MESSAGE_MAP *m_pMsgMap; \
    public: \
      MESSAGE_MAP*GetMessageMap(); \
      
    ...
    class MyClass:public MyBase
    {
      DECLARE_MESSAGE_MAP();
    };
    Code:
    #define BEGIN_MESSAGE_MAP(theClass,baseClass) \
    MESSAGE_MAP *GetMessageMap() {return &theClass::m_pMsgMap;} \
    theClass::m_pFinalMap={baseClass::m_pMsgMap,m_pEntries[0] }; \
    const MESSAGE_MAP theClass::m_pEntries[] = { 
    #define ON_COMMAND(id,memberFxn) \
    {id,(MSG_MAP)memberFxn},
    
    ....
    So as you can see they are just having you build up a static array of message map entries by using those macros. It's ugly and it gets worse but it doesn't add to this discussion.

    Now here is their hash algo. Sadly MS doesn't believe in commenting code except every 100 or so lines so it's a bit of a mess.

    Code:
    ...
    const AFX_MSGMAP* pMessageMap; pMessageMap = GetMessageMap();
    UINT iHash; iHash = (LOWORD((DWORD_PTR)pMessageMap) ^ message) & (iHashMax-1);
    winMsgLock.Lock(CRIT_WINMSGCACHE);
    AFX_MSG_CACHE* pMsgCache; pMsgCache = &_afxMsgCache[iHash];
    const AFX_MSGMAP_ENTRY* lpEntry;
    if (message == pMsgCache->nMsg && pMessageMap == pMsgCache->pMessageMap)
    {
      // cache hit
      lpEntry = pMsgCache->lpEntry;
      winMsgLock.Unlock();
      if (lpEntry == NULL) return FALSE;
    
      // cache hit, and it needs to be handled
      if (message < 0xC000)
       goto LDispatch;
      else
       goto LDispatchRegistered;
    }
    ....
    ....
    Don't yell at me I didn't code it and I know it's ugly but it's MS. Now how is this going to determine if a message with a certain ID is in the static array or not? The ID could be anywhere in the static array or not in it at all. There must be something in the _afxMsgCache[] array to get this all to work. I'm quite lost on most of it.

    Help.

    EDIT:

    Another question. How would you overload the assignment operator relative to this:

    Code:
    struct MESSAGE_MAP
    {
      MESSAGE_MAP_ENTRY *pEntries;
      MESSAGE_MAP *pBaseMap;
    };
    Isn't this going to cause a stack overflow?

    Code:
    ...
    friend MESSAGE_MAP &operator=(const MESSAGE_MAP &map)
    {
      pEntries=map.pEntries;
    
      //This line confuses me - won't this just call this operator we are in...again and again and again?
      pBaseMap=map.pBaseMap;
    
      return *this;
    }
    Last edited by VirtualAce; 09-13-2006 at 02:14 AM.

  5. #5
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    We probably want to shove this over onto the C++ forum.
    Quote Originally Posted by Bubba
    Here is what we have. In our script system we wanted a way to let the user or programmer respond to messages w/o having to write a function for every single script command. Essentially if the handler has not been created, the message is not handled.

    We tore into the MFC code that does this, much to our chagrin, and figured out how they did it.
    Yeah I was going to do the same thing for a project which never got finished (or really much of a start) for the contest board. Everyone would include in their package a header file which included their function's name, and some other relevant info, which all would go into an array. It would end up something like:
    Code:
    struct stuff
    {
    } ai[] =
    {
    #include "mine.h"
    #include "yours.h"
    #include "theirs.h"
    };
    That way I'd just have to add an include line for their header file and compile, which would save me having to find out what everyone named their functions...

    Quote Originally Posted by Bubba
    The macro they use to declare the message map actually emits code into the classes for the static array that contains the message map information. The macros in the cpp file for the class emit code into the class and then begins a static array buildup. Essentially when you do an ON_COMMAND() or something else, those macros are just adding to an array. Kind of like this:
    Nod. See above.
    Quote Originally Posted by Bubba
    Now here is their hash algo. Sadly MS doesn't believe in commenting code except every 100 or so lines so it's a bit of a mess.
    I'll comment it from what I can gather.
    Code:
    ...
    const AFX_MSGMAP* pMessageMap; pMessageMap = GetMessageMap(); /* Point at their entry. */
    UINT iHash; iHash = (LOWORD((DWORD_PTR)pMessageMap) ^ message) & (iHashMax-1);
            /* 
            - Use the value of their entry,
            - XOR it with whatever 'message' is.
            - In effect, MOD it with the size of our hash table.
                  ( Not exactly, but close enough. )
            - We now have some value that will fit in our table.
            */
    winMsgLock.Lock(CRIT_WINMSGCACHE); /* No idea. */
    AFX_MSG_CACHE* pMsgCache; /* a pointer to the type our table contains probably */
    pMsgCache = &_afxMsgCache[iHash]; /* Point to the bucket whose hash we just found. */
    const AFX_MSGMAP_ENTRY* lpEntry;
    
    /* If the message is already there, and the map is the same... */
    if (message == pMsgCache->nMsg && pMessageMap == pMsgCache->pMessageMap)
    {
      // cache hit
      lpEntry = pMsgCache->lpEntry;
      winMsgLock.Unlock();
      if (lpEntry == NULL) return FALSE;
    
      // cache hit, and it needs to be handled
      if (message < 0xC000)
       goto LDispatch;
      else
       goto LDispatchRegistered;
    }
    ....
    ....
    That's about it as far as I can tell. I assume somewhere in the last part, they handle collisions.
    Quote Originally Posted by Bubba
    Don't yell at me I didn't code it and I know it's ugly but it's MS. Now how is this going to determine if a message with a certain ID is in the static array or not? The ID could be anywhere in the static array or not in it at all. There must be something in the _afxMsgCache[] array to get this all to work. I'm quite lost on most of it.

    Help.
    Basically, let's say our message is 10, and our array has N entries.
    Code:
    ...
    ourtable[ N ];
    size_t here;
    
    here = message(10) ^ X & (N-1);
    Put whatever you want in as X. XOR the message by that number. Say it gives us 33. Take 33 & it by N-1. Say that gives us 7. Since we're AND-ing with a value smaller than N, we know that whatever value it is, it will fit into our array. It will be either 0, 1, 2 ... N-1. Since N-1 is the upper bound of our array, we're ok to access whatever it is.

    Now check to see if that spot is set to NULL. If it is, nothing is there. If it isn't, something is there. Check that with our message or whatever. If it's not the same, then it's a different message, meaning we have a collision.

    That means two or more things share the same spot. How you handle collisions is up to you. Let's say our hash table is an array of linked lists. We find which "bucket" it falls in, and if it's not at the top, when we work down the list until we find it. If we haven't found it, by the time we reach the end, then we have a message we don't know how to handle.

    Anyway that'd be how you'd check collisions in an array of linked lists. I'm not sure what they do, but they'll have some method to in effect do the same thing.

    Quote Originally Posted by Bubba
    EDIT:

    Another question. How would you overload the assignment operator relative to this:
    No idea. I'm not a C++ person. Let's move it there and someone can probably better help.


    Quzah.
    Last edited by quzah; 09-13-2006 at 06:03 AM. Reason: line wrapping
    Hope is the first step on the road to disappointment.

  6. #6
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Thanks a bunch Quzah I'm beginning to understand this. I don't like all the macros and crap but it is a powerful way to do message response. However, it is quite confusing. I tried using a std::vector and then just doing push_back(object(params)) but the vector came back as an unresolved external symbol even if I stuck it in the class myself. The only way this pre-processor mumbo jumbo works is using static arrays.

    Why can't I use a static vector or static STL container?

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    If I remember correctly (I used to spend a lot of time looking at MFC's system) and interpret the snippet you posted correctly, you're misunderstanding the MFC system. The hash-stuff is used for looking up handlers in a cache that is independent of the static message map.
    If there is no hit in the cache (i.e. the message hasn't recently been handled already), a linear search on the static message map is performed.
    The cache seems to be shared between all message maps of a class and its bases, which is why the XOR the message ID with the address of the message map.


    Try using the Boost.Assign library together with macros, it might help you with generating code that builds up an STL container.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Creating Linked List Using Hashing and Stack
    By m0ntana in forum C Programming
    Replies: 2
    Last Post: 04-07-2007, 07:11 AM
  2. Hashing, indexing and phonetic normalization for approximate str matching...
    By biterman in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 11-21-2006, 09:42 AM
  3. Replies: 8
    Last Post: 09-11-2006, 11:26 AM
  4. Double Hashing
    By carrja99 in forum C++ Programming
    Replies: 1
    Last Post: 03-28-2003, 08:36 AM
  5. hashing
    By condorx in forum C++ Programming
    Replies: 2
    Last Post: 01-31-2003, 09:36 PM