Thread: Efficent string comparisons.

  1. #1
    Registered User ~Kyo~'s Avatar
    Join Date
    Jun 2004
    Posts
    320

    Efficent string comparisons.

    I am looking for a time efficent way of taking different command strings so proper functions get called.

    if....else seems to be worst case

    switch(first char of command) might be viable, but would still have if....elses. I could use something like a key in the first place which might be viable. I am wondering if anyone knows an efficent way to do this as it is intended to be used on a server with lots of incoming data.

  2. #2
    Registered User
    Join Date
    Aug 2010
    Location
    Poland
    Posts
    733
    map - C++ Reference std::string as key and int/function pointer as value.

    You can also write your own comparator, if, for example, many functions have the same prefix.

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    If the list of strings isn't going to change then I suggest setting up a vector< pair<string, MyFuncPtrType> > then sorting that, and lastly performing lookups using lower_bound, comparing iter->first to the input string to make sure an actual match was found.
    This is more efficient space-wise and time-wise than a map, but only because you don't modify the data after the initial setup.

    One important thing though... How many strings are we talking about here?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    Registered User ~Kyo~'s Avatar
    Join Date
    Jun 2004
    Posts
    320
    Data will change. The number of stings depends on the number of logins the server will be accepting.

  5. #5
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    Wait. What does "command strings" have to do with "number of logins the server will be accepting"?

    What are doing?

    Soma

  6. #6
    Registered User ~Kyo~'s Avatar
    Join Date
    Jun 2004
    Posts
    320
    I had a strange thought at that time didn't work actually decided to use a struct to store some extra data instead and was considering using the code to go off tells which still would need to lookup off a list of strings for what connection to send out on. Command strings will be something like move to x,y coordinate so it would parse similar to MOV_X_Y and the commands will be parsed as strings. Planning on using strtok to split it up so we will have commands like MOV, CHT, LOGIN, EQUIP, CRAFT, USE, ATTACK, etc. I can say login can be low priority and things like that, but if I get a sufficently large pool of valid commands I need a way so a thread doesn't take a long time to execute the command.

  7. #7
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    You have a bit of data, a set of commands as represented by strings, and a set of operations to perform based upon those strings. A map based association is ideal.

    Of course, you may be able to speed this up, in the general case, by using hashing function (even a perfect hash since you have a known and finite set of strings) to reduce command strings to an integer before looking them up in your map.

    Soma

  8. #8
    Registered User ~Kyo~'s Avatar
    Join Date
    Jun 2004
    Posts
    320
    Would using a divide and conquer deal work as well? I know it is fairly efficent. IDK what it is really called, but the idea is to guess a number between 1 and 100 you guess 50 and keep halving untill you have your solution. Is hashing faster than this? I am thinking about maybe using strcmp() to accomplish this as it will give me the bigger or smaller kind of feedback. I already use this to lookup in a database for my equipment stats as follows

    Code:
    Weapon & operator|(Item &other)
      {
         Item::operator|(other);    //generic to specific assignment.
         int lower = 0, upper = nmbr;
         int guess = upper/2;
    
         do
         {
             if(Get_ID() < infolist[guess]->id)
             {
                 upper = guess;
             }
             if(Get_ID() > infolist[guess]->id)
             {
                 lower = guess;
             }
             if(guess == (int)((lower + upper)/2))guess--;
             else guess = (lower + upper)/2;
         }while(Get_ID() != infolist[guess]->id);
    
         min_dmg = infolist[guess]->mdmg;
         max_dmg = infolist[guess]->xdmg;
         hands = infolist[guess]->hands;
         speed = infolist[guess]->speed;
         range = infolist[guess]->range;
      }

  9. #9
    Registered User
    Join Date
    Aug 2010
    Location
    Poland
    Posts
    733
    This is what maps do (access is logarithmic in time) and hashing is something that may speed it up even more. You needed command priorities for if-else approach.

    Just use map and when you notice that this is a performance issue, then think about hashing and other optimizations.
    Last edited by kmdv; 05-16-2011 at 10:37 PM.

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by ~Kyo~ View Post
    Would using a divide and conquer deal work as well? I know it is fairly efficent. IDK what it is really called, but the idea is to guess a number between 1 and 100 you guess 50 and keep halving untill you have your solution. Is hashing faster than this? I am thinking about maybe using strcmp() to accomplish this as it will give me the bigger or smaller kind of feedback. I already use this to lookup in a database for my equipment stats as follows

    Code:
    Weapon & operator|(Item &other)
      {
         Item::operator|(other);    //generic to specific assignment.
         int lower = 0, upper = nmbr;
         int guess = upper/2;
    
         do
         {
             if(Get_ID() < infolist[guess]->id)
             {
                 upper = guess;
             }
             if(Get_ID() > infolist[guess]->id)
             {
                 lower = guess;
             }
             if(guess == (int)((lower + upper)/2))guess--;
             else guess = (lower + upper)/2;
         }while(Get_ID() != infolist[guess]->id);
    
         min_dmg = infolist[guess]->mdmg;
         max_dmg = infolist[guess]->xdmg;
         hands = infolist[guess]->hands;
         speed = infolist[guess]->speed;
         range = infolist[guess]->range;
      }
    That's simply a slow and buggy version of a binary search. If you want a fast and non-buggy version you use std::lower_bound. I've written an article about what's inefficient with the way you have done it, on my website.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #11
    Registered User ~Kyo~'s Avatar
    Join Date
    Jun 2004
    Posts
    320
    infolist is not just an array of ints or something... I did read the page though.
    Code:
    struct weapon_info{int id;int mdmg;int xdmg;int hands;int speed;int range;};
    weapon_info** Weapon::infolist = NULL;
    I do have a question though I know when you create a thread you pass a void pointer to a entry point for said thread could I make a struct with a void pointer to a function and have the string also in the struct. So basicly using the string as the key then use the function pointer to execute the function. Is something like this even possible?

  12. #12
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by ~Kyo~ View Post
    infolist is not just an array of ints or something... I did read the page though.
    The type of item in the array being binary searched doesn't affect it's being a binary search.

    Don't use void pointers in C++.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  13. #13
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    I came up with a command lookup for a MUD-ish game I wrote that basically built a command tree built of nodes that were defined like:
    Code:
    struct command_node
    {
      char letter;
      struct command_node *branches[26];
      void (*func)();
    }
    Say you had the following commands: LOGON, LOGOFF, LOOK, LOOP

    The tree would end up looking something like this:
    Code:
                  L
                 /
                O
               / \
              G   O
             /   / \
            O   K   P
           / \
          N   F
             /
            F
    So to lookup a command you'd just loop through the user's input string and walk through the tree:
    Code:
    for each letter in input string
      if treeroot->branches[letter] is not NULL, step into that node
    ...until you either came to a node->branches[letter] that was NULL indicating an unknown command or the end of the user's input. If the current node had a valid function pointer, it was called to handle the command. It's a little inefficient memory-wise to keep track of the command tree, but lookups were extremely fast. It definitely beat matching against each command, especially since the program supported about 150 different commands. Looking back, a hash table probably would be a lot simpler to implement.
    If you understand what you're doing, you're not learning anything.

  14. #14
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    So... you created a variant on a "Trie" then?

    That's kind of a weird choice for this, but that's not why I'm responding to your post.

    For the future, you may want to look into a more space saving form by sharing prefixes across nodes or packing shared prefixes into the parent.

    Essentially, you would, for example, change your node element to an array of elements and share that, bridge it, or merge it into a parent.

    Consider this simple change in structure.

    Code:
    struct command_node
    {
      char * letter;
      struct command_node *branches[26];
      void (*func)();
    };
    Where instead of the two nodes `G' (L->O->G) and `O' (L->O->G->O) each with its own `sizeof(node)' storage, you only have the node `GO' (L->O->GO) saving the obvious `sizeof(node)' storage, reducing branching, and somewhat increasing locality or reference.

    Common names for these space saving forms are "prefix tree", "julian tree", and "radix tree".

    Soma

  15. #15
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Yeah, I don't know what it's called. It came from my brain. If it's a variant on something, then whatever.

    I considered your idea for the multiple-letters-per-node approach, but had decided against it because the program had the capability to load addons dynamically and insert new commands into the tree. Adding that prefix stuff added a level of complexity to command insertion that I didn't want to get into. And really, the whole tree took maybe a few dozen kilobytes of memory, which is pretty trivial. If we were talking about hundreds of thousands of commands, then sure.
    If you understand what you're doing, you're not learning anything.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Most efficent data type to work with?
    By h3ro in forum Tech Board
    Replies: 6
    Last Post: 07-04-2008, 06:47 PM
  2. Illegal string comparisons working?
    By Oysterman in forum C Programming
    Replies: 3
    Last Post: 05-16-2007, 06:34 PM
  3. Array Comparisons
    By B.C.Lioness in forum C++ Programming
    Replies: 8
    Last Post: 03-20-2004, 07:37 PM
  4. Replies: 2
    Last Post: 12-03-2001, 02:32 AM
  5. need HELP!!!!!!!!!C++ COMPARISONS
    By awakicee in forum C++ Programming
    Replies: 7
    Last Post: 10-01-2001, 08:08 PM