Simply checking spelling only involves determining if a word exists in the dictionary. (A trivial task once you've loaded the file in to a std::set).
You seem to be attempting to do spelling suggesting as well. However you've used a rather simplistic technique that will only catch pretty basic errors. Real spelling suggestors are much more advanced. They would also rate certain kinds of error as more likely and hence can provide more likely suggestions. They likely work very differently too.
You should probably read up on Minimum Edit Distance.
My objective in the beggining is just to write a simple, small and fast spell corrector, but in the next days i will try to improve it, your reading will be very usefull to me, thank you.
Ok, have reviewed all suggestions and I thank you all, heres the result so far:
SpellCorrector.h
Code:
#ifndef _SPELLCORRECTOR_H_
#define _SPELLCORRECTOR_H_
#include <iostream>
#include <map>
#include <fstream>
#include <sstream>
#include <algorithm>
#include <vector>
typedef std::map<std::string, int> Dictionary;
typedef std::pair<std::string, int> Pair;
typedef std::vector<std::string> Vector;
class SpellCorrector
{
private:
static const char alphabet[];
Dictionary dictionary;
void edits(std::string& word, Vector& result);
void known(Vector& results, Dictionary& candidates);
public:
void load(std::string filename);
std::string correct(std::string word);
};
#endif
SpellCorrector.cpp
Code:
#include "SpellCorrector.h"
using namespace std;
const char SpellCorrector::alphabet[] = "abcdefghijklmnopqrstuvwxyz";
bool sortBySecond(const Pair& left, const Pair& right)
{
return left.second < right.second;
}
void SpellCorrector::load(string filename)
{
ifstream file(filename.c_str());
char* data;
file.seekg(0, ios_base::end);
int length = file.tellg();
file.seekg(0, ios_base::beg);
data = new char[length+1];
file.read(data, length);
string line(data);
delete [] data;
transform(line.begin(), line.end(), line.begin(), tolower);
string::size_type position = 0;
while ((position = line.find_first_of(alphabet, position)) != string::npos)
{
string::size_type endPosition = line.find_first_not_of(alphabet, position);
dictionary[line.substr(position, endPosition - position)]++;
position = endPosition;
}
}
string SpellCorrector::correct(string word)
{
Vector result;
Dictionary candidates;
if (dictionary.find(word) != dictionary.end()) { return word; }
edits(word, result);
known(result, candidates);
if (candidates.size() > 0) { return max_element(candidates.begin(), candidates.end(), sortBySecond)->first; }
for (unsigned int i = 0;i < result.size();i++)
{
Vector subResult;
edits(result[i], subResult);
known(subResult, candidates);
}
if (candidates.size() > 0) { return max_element(candidates.begin(), candidates.end(), sortBySecond)->first; }
return "";
}
void SpellCorrector::known(Vector& results, Dictionary& candidates)
{
Dictionary::iterator end = dictionary.end();
for (unsigned int i = 0;i < results.size();i++)
{
Dictionary::iterator value = dictionary.find(results[i]);
if (value != end) candidates[value->first] = value->second;
}
}
void SpellCorrector::edits(string& word, Vector& result)
{
for (string::size_type i = 0;i < word.size(); i++) result.push_back(word.substr(0, i) + word.substr(i + 1)); //deletions
for (string::size_type i = 0;i < word.size() - 1;i++) result.push_back(word.substr(0, i) + word[i+1] + word.substr(i + 2)); //transposition
for (char j = 'a';j <= 'z';++j)
{
for (string::size_type i = 0;i < word.size(); i++) result.push_back(word.substr(0, i) + j + word.substr(i + 1)); //alterations
for (string::size_type i = 0;i < word.size() + 1;i++) result.push_back(word.substr(0, i) + j + word.substr(i) ); //insertion
}
}
By the way, why should i implement the class functions in the separate .cpp? I have always done this but with no good reason to do that, and i think that header style implementations are easier to redistribute a library just like STL and some boost libraries do... More suggestions would be great.