Designing a chatbot

**Neo1** · 01-10-2012

So i'm working on a chatbot project, or atleast i've reached the stage where i'm thinking about how to implement it.

It's going to be an ELIZA-like chatbot, that is, it will have a list of keywords, and corresponding list of responses to each of those keywords.

Right now i'm working on the user input. I want it to be as flexible as possible, first of all i've decided to convert all user input to lower case, and store the keywords only in lower-case. The plan is to split the sentence that the user entered into words, then go through each word and match it to each keyword using the Levenshtein distance, and then picking the keyword with the lowest distance.

The thing is, i need a threshhold, if the user enters a sentence where the best match is a distance of 15, it will pick that response, and it will seem like the chatbot is on shrooms. So i've now added a check to see if the best match has a distance of more than 3, and if so, make the bot print "I don't know what you mean." instead of the response it would otherwise have chosen.

Now the thing is, i have a keyword "hey", which makes the bot respond with "Hello there!". If the user enters "qqq" or something like that, it will match to hey and the bot greets the user. This is because the distance is 3 or less between "qqq" and "hey", but that obviously isn't the behaviour i'm looking for. I've tried lowering the threshhold to 2, but that is too strict, then i might aswell just do straight up string comparison.

So i stumbled across the SOUNDEX algorithm, which is a way to encode names, so that similarly sounding names encode to the same thing. Am i missing something, or could this be used for all kinds of text strings, not just names?

My idea was to continue using the edit-distance check, but then also do a SOUNDEX encoding of the keyword and the inputted word, and if they encode to different strings (ie, are two words that don't sound alike), i could make this factor in somehow in the way the chatbot chooses which keyword to match to.

I seem to recall a thread somewhere on cboard a while ago, about implementing a spell checker in C, where CommonTater mentioned an algorithm that will match similarly sounding words, but i can't find it now. Are there any other besides SOUNDEX?

Is this the way you would go about making a chatbot? Any suggestions on how i can make it more tolerant when matching user input to keywords, without overdoing it so that some nonsense-textstring matches to a keyword?

**anduril462** · 01-10-2012

No, soundex isn't restricted to just names, though the rules for the original soundex algorithm was designed for pronunciation of names in English, so it's probably less well suited for regular speech. Soundex - Wikipedia, the free encyclopedia lists a few other algorithms of encoding pronunciation. You might look into metaphone.

If it were me, I would want to run the words the user typed through something like a spelling suggestion algorithm, not just Levenshtein distance. A word like "qqq" should come up with no suggestions, and your chat bot could say "Sorry, I don't understand you." You're interested in processing words that are actually supposed to be English, not what's produced by some two-year-old smashing on the keyboard. GNU's Aspell is open source, and there's a library and API to use, so look into that. Here are some links:

I suspect combining the two methods will produce better results if you properly tune the system, but I have no proof for that, just a hunch. And I have no idea how you would tune your system, except perhaps empirically.

Thread: Designing a chatbot

Thread Tools

Search Thread

Display

Designing a chatbot

Similar Threads

Interface with a ChatBot

simple chatbot problems

designing web crawler in c

Need help designing first c++ app

Designing an MMORPG