Syntax parsing and Dictonaries

**@nthony** · 05-26-2007

I'm currently working on a simplistic command interpreter which simply reads a string and decides what command it is. Up until recently, the command set was small so I had been using the naive approach of checking the given command against an array of possible commands one at a time. However since the commands are strings, this is highly ineffecient with a large number of commands.
I'm thinking a Hashmap/Dictionary approach might be a classical solution to this, however I am looking for an even more effecient solution: Since the set of commands are fixed, I am looking to build a Dictionary that has no-unused space and zero-collision. Wiki tells me this is called a Perfect Hash, however I am having trouble finding such a function. If I try to create a Hashmap with no-unused I get the potential for high-collision, and vice-verca.
Basically what I'm asking is what is the most effecient data structure I can use to store and look-up strings from a fixed pre-known set, using minimal space and time (preferably both, but the priority is on time)?
Some solutions:
Syntax Tree (trie) - but look-ups are not fast since (worst-case scenrio) each character must be compared with n values (n = number of commands).
Dictionary - look-ups are fast only if a large amount of space is used to prevent collisions.

**IsmAvatar2** · 05-26-2007

In a Dictionary of unknown size N, with unknown length M (worst case), provided it is sorted, you can do a lookup in M*log2(N) worst-case and usual case. In a poorly defined hash, the number of lookups, worst-case, would be M*N.

I'm not sure what a Syntax Tree is, but if you mapped out the alphabet on a tree (root has 26 branches - unless a letter is not used), and so on, mapping a tree of the dictionary's spellings, worst case would then be M, and this is achieved each time it actually finds the word. Best case would be 1, where it looks up the first letter and finds that it is unique from the dictionary (the word does not exist in the dictionary). The problem with this is that the memory usage would probably be rather large, since each letter in M is paired with 26 pointers, so worst case would probably be 26cM (probability/statistics "c", approaching "power", as in 26^M - although I'm probably wrong here, but it gives you an idea that it indeed would easily take up a lot of memory), but it'd be highly unlikely to have that since that means your dictionary contains every possible combination of 26 letters choose M.

Also realize that a lot of languages compile to machine code first (such as C), and then run, as opposed to read-and-run, which is usually only done via a debugger. Therefor, the compiler can be relatively slow because it's not doing the actual running, it's just converting the code to a data structure that can be looked up more quickly for the runner, eg converting english words into a full tree (usually of numbers of size L, so every possible combination of numbers is used up to L-1, plus a few on the end). That way, the commands can then be stored in an array of size L and looked up by index.

**@nthony** · 05-26-2007

Yes, that's the problem with tree's I'm having.. potentially there can be 26 comparisons on every node level if commands have similar spelling.
In light of what you said about compilers, I guess I would be more interested on the implementation scripting languages would use to parse and look-up string commands.

Thread: Syntax parsing and Dictonaries

Thread Tools

Search Thread

Display

Syntax parsing and Dictonaries