Thread: Software optimization

  1. #1
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220

    Software optimization

    Hey, i have writen a spell corrector class. But now i wanted it to be optimized/reviewed so that i can learn from my errors. Readability, performance, style, every suggestions is worths now:

    Code:
    /* 
     * SpellChecker.h
     * 
     * Copyright  (C)  2007  Felipe Farinon <[email protected]>
     * 
     * Version: 1.0
     * Author: Felipe Farinon <[email protected]>
     * Maintainer: Felipe Farinon <[email protected]>
     * URL: http://scarvenger.wordpress.com/
     * 
     * This program is free software; you can redistribute it and/or modify
     * it under the terms of the GNU General Public License as published by
     * the Free Software Foundation; either version 2 of the License, or
     * (at your option) any later version.
     * 
     * This program is distributed in the hope that it will be useful,
     * but WITHOUT ANY WARRANTY; without even the implied warranty of
     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     * GNU General Public License for more details.
     * 
     * You should have received a copy of the GNU General Public License
     * along with this program; if not, write to the Free Software
     * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
     * 
     * Commentary: 
     * 
     * See http://scarvenger.wordpress.com/.
     * 
     * Code:
     */
    
    #ifndef _SPELLCHEKER_H_
    #define _SPELLCHEKER_H_
    
    #include <iostream>
    #include <map>
    #include <fstream>
    #include <sstream>
    #include <algorithm>
    #include <vector>
    
    #include <boost/foreach.hpp>
    
    using namespace std;
    
    typedef map<string, int> Dictionary;
    typedef pair<string, int> Pair;
    typedef vector<string> Vector;
    
    bool max(const Pair& left, const Pair& right)
    {
    	return left.second < right.second;
    }
    
    class SpellChecker
    {
    private:
    	static const char alphabet[];
    	Dictionary dictionary;
    
    	void edits(string& word, Vector& result)
    	{
    		for (string::size_type i = 0;i < word.size();     i++) result.push_back(word.substr(0, i)                  + word.substr(i + 1)); //deletions
    		for (string::size_type i = 0;i < word.size() - 1;i++) result.push_back(word.substr(0, i) + word[i+1] + word.substr(i + 2)); //transposition
    
    		for (char j = 'a';j <= 'z';++j)
    		{
    			for (string::size_type i = 0;i < word.size();     i++) result.push_back(word.substr(0, i) + j + word.substr(i + 1)); //alterations
    			for (string::size_type i = 0;i < word.size() + 1;i++) result.push_back(word.substr(0, i) + j + word.substr(i)     ); //insertion
    		}
    	}
    
    	void known(Vector& results, Dictionary& candidates)
    	{
    		Dictionary::iterator end = dictionary.end();
    		for (unsigned int i = 0;i < results.size();i++)
    		{
    			Dictionary::iterator value = dictionary.find(results[i]);
    		
    			if (value != end) candidates[value->first] = value->second;
    		}
    	}
    
    	void stringToLower(string& text)
    	{
    		int lenght = text.size();
    		for (int i = 0;i < lenght;i++)
    		{
    			text[i] = tolower(text[i]);
    		}
    	}
    
    public:
    	bool load(const char* filename)
    	{
    		ifstream file(filename);
    		char* data;
    	
    		file.seekg(0, ios_base::end);
    		int lenght = file.tellg();
    		file.seekg(0, ios_base::beg);
    
    		data = new char[lenght+1];
    
    		file.read(data, lenght);
    
    		string line(data);
    
    		delete [] data;
    	
    		stringToLower(line);
    
    		string::size_type position = 0;
    		while ((position = line.find_first_of(alphabet, position)) != string::npos)
    		{
    			string::size_type endPosition = line.find_first_not_of(alphabet, position);
    
    			dictionary[line.substr(position, endPosition - position)]++;
    
    			position = endPosition;
    		}
    
    		return true;
    	}
    
    	string check(string word)
    	{
    		Vector result;
    		Dictionary candidates;
    
    		if (dictionary.find(word) != dictionary.end()) { return word; }
    
    		edits(word, result);
    		known(result, candidates);
    
    		if (candidates.size() > 0) { return max_element(candidates.begin(), candidates.end(), max)->first; }
    
    		for (unsigned int i = 0;i < result.size();i++)
    		{
    			Vector subResult;
    			edits(result[i], subResult);
    			known(subResult, candidates);
    		}
    
    		if (candidates.size() > 0) { return max_element(candidates.begin(), candidates.end(), max)->first; }
    
    		return "";
    	}
    };
    
    const char SpellChecker::alphabet[] = "abcdefghijklmnopqrstuvwxyz";
    
    
    #endif

  2. #2
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    I didn't actually look at the algorithms or techniques you used to solve the problems, but these are just a few comment about the style:
    • I don't like the global typedefs. Dictionary is an ok name, but the names Pair and Vector are too generic to describe what the type actually is.
    • >> data = new char[lenght+1];
      I prefer to use vector for this. In your code, it probably won't end up mattering, but using vector is safer because you won't have a memory leak no matter what and you don't have to worry about a memory leak. For example, what if you want to exit the function early if read() fails? You have to remember to delete [] data. If data was a vector, you wouldn't have to worry about it.
    • You might as well use transform for your toLower function.
    • I'd have load take a string. Then just call c_str() from the string. Your other functions use string, so it's best to keep a consistent interface.
    • Your max function is oddly named. It is meant to sort pairs by the second value. Perhaps SortBySecond or something like that would make more sense?

  3. #3
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    using namespace std;

    NEVER do this in a header file!
    Otherwise you force everyone who uses your header file to import the entire std namespace whether they want to or not.

    If you want to use a using statement, keep them in .cpp files AFTER all your #include statements.

    Also, I don't know if you left out comments to shorten the post, but I think it's a good idea to always document your functions thoroughly so other people reading it know how/what it's supposed to do, and so you'll know 6 months from now when you look at your code again and scratch your head.
    Here's an example of one of my functions where I use Doxygen style comments:

    Code:
    	/** This assignment operator copies the pointer held by rhs to this object, and also
    	 *  transfers ownership from rhs to this object by setting rhs to NULL.
    	 *
    	 *  Any existing pointer held by this object will be deleted before assigning the rhs
    	 *  pointer to this object.
    	 *
    	 *  @param rhs [IN] - This is a reference to another AutoPtr object which will be
    	 *			copied into this object.  rhs will be set to NULL.
    	 *
    	 *  @return AutoPtr& - It will return a reference to this object (i.e.  *this).
    	 */
    	AutoPtr<T_Type, T_Deletor>& operator=( AutoPtr<T_Type, T_Deletor>&  rhs )	// throw()
    	{
    		// Make sure we aren't assigning to ourself.
    		if ( this != &rhs )
    		{
    			// If the pointers aren't identical and our pointer isn't NULL, delete our pointer.
    			if ( m_Ptr != rhs.m_Ptr )
    			{
    				if ( m_Ptr != NULL )
    				{
    					m_Deletor( m_Ptr );
    				}
    			}
    
    			// Copy the pointer and release rhs from ownership of the pointer.
    			m_Ptr = rhs.release();
    		}
    
    		return *this;
    	}
    Last edited by cpjust; 12-11-2007 at 06:23 PM. Reason: Add comment about comments.

  4. #4
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Method bodies, particularly such large ones belong in a separate C++ file. So should non inline functions like max. In this case you can alternately just put the key word inline before the function.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  5. #5
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    You misspelled length as lenght.

    Don't make your load function hard-coded to return true. It should either be able to return something different when it fails or it should throw an exception, in which case there's no point in returning anything upon success, as the lack of an exception being thrown says that it succeeded.

    Simply checking spelling only involves determining if a word exists in the dictionary. (A trivial task once you've loaded the file in to a std::set).
    You seem to be attempting to do spelling suggesting as well. However you've used a rather simplistic technique that will only catch pretty basic errors. Real spelling suggestors are much more advanced. They would also rate certain kinds of error as more likely and hence can provide more likely suggestions. They likely work very differently too.
    You should probably read up on Minimum Edit Distance.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  6. #6
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    Simply checking spelling only involves determining if a word exists in the dictionary. (A trivial task once you've loaded the file in to a std::set).
    You seem to be attempting to do spelling suggesting as well. However you've used a rather simplistic technique that will only catch pretty basic errors. Real spelling suggestors are much more advanced. They would also rate certain kinds of error as more likely and hence can provide more likely suggestions. They likely work very differently too.
    You should probably read up on Minimum Edit Distance.
    My objective in the beggining is just to write a simple, small and fast spell corrector, but in the next days i will try to improve it, your reading will be very usefull to me, thank you.

    Ok, have reviewed all suggestions and I thank you all, heres the result so far:

    SpellCorrector.h
    Code:
    #ifndef _SPELLCORRECTOR_H_
    #define _SPELLCORRECTOR_H_
    
    #include <iostream>
    #include <map>
    #include <fstream>
    #include <sstream>
    #include <algorithm>
    #include <vector>
    
    typedef std::map<std::string, int> Dictionary;
    typedef std::pair<std::string, int> Pair;
    typedef std::vector<std::string> Vector;
    
    class SpellCorrector
    {
    private:
    	static const char alphabet[];
    	Dictionary dictionary;
    
    	void edits(std::string& word, Vector& result);
    	void known(Vector& results, Dictionary& candidates);
    
    public:
    	void load(std::string filename);
    	std::string correct(std::string word);
    };
    
    #endif
    SpellCorrector.cpp
    Code:
    #include "SpellCorrector.h"
    
    using namespace std;
    
    const char SpellCorrector::alphabet[] = "abcdefghijklmnopqrstuvwxyz";
    
    bool sortBySecond(const Pair& left, const Pair& right)
    {
    	return left.second < right.second;
    }
    
    void SpellCorrector::load(string filename)
    {
    	ifstream file(filename.c_str());
    	char* data;
    	
    	file.seekg(0, ios_base::end);
    	int length = file.tellg();
    	file.seekg(0, ios_base::beg);
    
    	data = new char[length+1];
    
    	file.read(data, length);
    
    	string line(data);
    
    	delete [] data;
    	
    	transform(line.begin(), line.end(), line.begin(), tolower);
    
    	string::size_type position = 0;
    	while ((position = line.find_first_of(alphabet, position)) != string::npos)
    	{
    		string::size_type endPosition = line.find_first_not_of(alphabet, position);
    
    		dictionary[line.substr(position, endPosition - position)]++;
    
    		position = endPosition;
    	}
    }
    
    string SpellCorrector::correct(string word)
    {
    	Vector result;
    	Dictionary candidates;
    
    	if (dictionary.find(word) != dictionary.end()) { return word; }
    
    	edits(word, result);
    	known(result, candidates);
    
    	if (candidates.size() > 0) { return max_element(candidates.begin(), candidates.end(), sortBySecond)->first; }
    
    	for (unsigned int i = 0;i < result.size();i++)
    	{
    		Vector subResult;
    		edits(result[i], subResult);
    		known(subResult, candidates);
    	}
    
    	if (candidates.size() > 0) { return max_element(candidates.begin(), candidates.end(), sortBySecond)->first; }
    
    	return "";
    }
    
    void SpellCorrector::known(Vector& results, Dictionary& candidates)
    {
    	Dictionary::iterator end = dictionary.end();
    	for (unsigned int i = 0;i < results.size();i++)
    	{
    		Dictionary::iterator value = dictionary.find(results[i]);
    		
    		if (value != end) candidates[value->first] = value->second;
    	}
    }
    
    void SpellCorrector::edits(string& word, Vector& result)
    {
    	for (string::size_type i = 0;i < word.size();     i++) result.push_back(word.substr(0, i)                  + word.substr(i + 1)); //deletions
    	for (string::size_type i = 0;i < word.size() - 1;i++) result.push_back(word.substr(0, i) + word[i+1] + word.substr(i + 2)); //transposition
    
    	for (char j = 'a';j <= 'z';++j)
    	{
    		for (string::size_type i = 0;i < word.size();     i++) result.push_back(word.substr(0, i) + j + word.substr(i + 1)); //alterations
    		for (string::size_type i = 0;i < word.size() + 1;i++) result.push_back(word.substr(0, i) + j + word.substr(i)     ); //insertion
    	}
    }
    By the way, why should i implement the class functions in the separate .cpp? I have always done this but with no good reason to do that, and i think that header style implementations are easier to redistribute a library just like STL and some boost libraries do... More suggestions would be great.
    Last edited by Scarvenger; 12-12-2007 at 08:14 AM.

  7. #7
    The larch
    Join Date
    May 2006
    Posts
    3,573
    By the way, why should i implement the class functions in the separate .cpp? I have always done this but with no good reason to do that, and i think that header style implementations are easier to redistribute a library just like STL and some boost libraries do...
    Unless the class is templated, implementing it in the header also makes compiling times longer, as well as throws code in that the user shouldn't see. With templates the class must be implemented in the header as virtually no compiler supports export keyword that would make the separation possible.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  8. #8
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Now you should also move some of the #include directives from your header file to your .cpp file. Try to #include as little as possible in header files -- basically you would only need <string> <map> & <vector>. Everything else is only used in your .cpp file.

  9. #9
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    It, for one thing, makes it a mess to traverse the header to look at all the functions and members a class contains.
    Last edited by Elysia; 12-13-2007 at 01:42 AM.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  10. #10
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by Elysia View Post
    It also makes it a mess to traverse the header to look at all the functions and members a class contains.
    Yeah, that's one of the reasons I like C++ over Java. But then there's so many C++ things that I wish Java had...

  11. #11
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    The compiler doesn't automatically inline functions that are in header files, but you have to put inline functions in header files, for the compiler to do the replacement.

  12. #12
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Also, from what I understand, even if you tell the compiler to inline a function, the compiler is free to do whatever it thinks is best. So inline is really more of a suggestion to the compiler.

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by cpjust View Post
    Also, from what I understand, even if you tell the compiler to inline a function, the compiler is free to do whatever it thinks is best. So inline is really more of a suggestion to the compiler.
    This is correct, the compiler will only inline functions if it "feels it's a good idea", which has some pretty complex heuristics, e.g. a medium sized function may be inlined if it's only called a few times, but add a few more calls, and it will not be inlined, because the compiler says "it bloats the code too much". Very short functions are usually always inlined - assuming the compiler can figure out how to do that, that is. Virtual functions can only be inlined when the compiler can know for sure what the exact function to call is, for example [assuming all other criteria apply to make it inlined].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Microsoft's compiler, for example, has been completely ignoring the inline keyword (aside from the code folding requirement it implies) for quite a few versions now.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  15. #15
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by cpjust View Post
    If you want to use a using statement, keep them in .cpp files AFTER all your #include statements.
    Because most code is contained within functions, you can do a "using namespace" inside function scope to make life easier without affecting users of the header file. Of course, you'd only have function code in a header if you were writing inline or template functions, but that's pretty common. For instance:

    Code:
    // File my_header.hh
    
    using namespace std; // BAD
    
    inline void some_inline_func()
    {
        using namespace std; // OKAY
    
        code_that_uses_stuff_from_std;
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Software Design/Test - Redmond, WA
    By IRVolt in forum Projects and Job Recruitment
    Replies: 2
    Last Post: 06-11-2008, 10:26 AM
  2. Why C Matters
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 136
    Last Post: 01-16-2008, 09:09 AM
  3. Adding trial period to software
    By BobS0327 in forum C Programming
    Replies: 17
    Last Post: 01-03-2006, 02:13 PM
  4. Optimization settings
    By Roaring_Tiger in forum C Programming
    Replies: 4
    Last Post: 02-23-2005, 02:53 AM