Thread: Software optimization

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220

    Software optimization

    Hey, i have writen a spell corrector class. But now i wanted it to be optimized/reviewed so that i can learn from my errors. Readability, performance, style, every suggestions is worths now:

    Code:
    /* 
     * SpellChecker.h
     * 
     * Copyright  (C)  2007  Felipe Farinon <[email protected]>
     * 
     * Version: 1.0
     * Author: Felipe Farinon <[email protected]>
     * Maintainer: Felipe Farinon <[email protected]>
     * URL: http://scarvenger.wordpress.com/
     * 
     * This program is free software; you can redistribute it and/or modify
     * it under the terms of the GNU General Public License as published by
     * the Free Software Foundation; either version 2 of the License, or
     * (at your option) any later version.
     * 
     * This program is distributed in the hope that it will be useful,
     * but WITHOUT ANY WARRANTY; without even the implied warranty of
     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     * GNU General Public License for more details.
     * 
     * You should have received a copy of the GNU General Public License
     * along with this program; if not, write to the Free Software
     * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
     * 
     * Commentary: 
     * 
     * See http://scarvenger.wordpress.com/.
     * 
     * Code:
     */
    
    #ifndef _SPELLCHEKER_H_
    #define _SPELLCHEKER_H_
    
    #include <iostream>
    #include <map>
    #include <fstream>
    #include <sstream>
    #include <algorithm>
    #include <vector>
    
    #include <boost/foreach.hpp>
    
    using namespace std;
    
    typedef map<string, int> Dictionary;
    typedef pair<string, int> Pair;
    typedef vector<string> Vector;
    
    bool max(const Pair& left, const Pair& right)
    {
    	return left.second < right.second;
    }
    
    class SpellChecker
    {
    private:
    	static const char alphabet[];
    	Dictionary dictionary;
    
    	void edits(string& word, Vector& result)
    	{
    		for (string::size_type i = 0;i < word.size();     i++) result.push_back(word.substr(0, i)                  + word.substr(i + 1)); //deletions
    		for (string::size_type i = 0;i < word.size() - 1;i++) result.push_back(word.substr(0, i) + word[i+1] + word.substr(i + 2)); //transposition
    
    		for (char j = 'a';j <= 'z';++j)
    		{
    			for (string::size_type i = 0;i < word.size();     i++) result.push_back(word.substr(0, i) + j + word.substr(i + 1)); //alterations
    			for (string::size_type i = 0;i < word.size() + 1;i++) result.push_back(word.substr(0, i) + j + word.substr(i)     ); //insertion
    		}
    	}
    
    	void known(Vector& results, Dictionary& candidates)
    	{
    		Dictionary::iterator end = dictionary.end();
    		for (unsigned int i = 0;i < results.size();i++)
    		{
    			Dictionary::iterator value = dictionary.find(results[i]);
    		
    			if (value != end) candidates[value->first] = value->second;
    		}
    	}
    
    	void stringToLower(string& text)
    	{
    		int lenght = text.size();
    		for (int i = 0;i < lenght;i++)
    		{
    			text[i] = tolower(text[i]);
    		}
    	}
    
    public:
    	bool load(const char* filename)
    	{
    		ifstream file(filename);
    		char* data;
    	
    		file.seekg(0, ios_base::end);
    		int lenght = file.tellg();
    		file.seekg(0, ios_base::beg);
    
    		data = new char[lenght+1];
    
    		file.read(data, lenght);
    
    		string line(data);
    
    		delete [] data;
    	
    		stringToLower(line);
    
    		string::size_type position = 0;
    		while ((position = line.find_first_of(alphabet, position)) != string::npos)
    		{
    			string::size_type endPosition = line.find_first_not_of(alphabet, position);
    
    			dictionary[line.substr(position, endPosition - position)]++;
    
    			position = endPosition;
    		}
    
    		return true;
    	}
    
    	string check(string word)
    	{
    		Vector result;
    		Dictionary candidates;
    
    		if (dictionary.find(word) != dictionary.end()) { return word; }
    
    		edits(word, result);
    		known(result, candidates);
    
    		if (candidates.size() > 0) { return max_element(candidates.begin(), candidates.end(), max)->first; }
    
    		for (unsigned int i = 0;i < result.size();i++)
    		{
    			Vector subResult;
    			edits(result[i], subResult);
    			known(subResult, candidates);
    		}
    
    		if (candidates.size() > 0) { return max_element(candidates.begin(), candidates.end(), max)->first; }
    
    		return "";
    	}
    };
    
    const char SpellChecker::alphabet[] = "abcdefghijklmnopqrstuvwxyz";
    
    
    #endif

  2. #2
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    I didn't actually look at the algorithms or techniques you used to solve the problems, but these are just a few comment about the style:
    • I don't like the global typedefs. Dictionary is an ok name, but the names Pair and Vector are too generic to describe what the type actually is.
    • >> data = new char[lenght+1];
      I prefer to use vector for this. In your code, it probably won't end up mattering, but using vector is safer because you won't have a memory leak no matter what and you don't have to worry about a memory leak. For example, what if you want to exit the function early if read() fails? You have to remember to delete [] data. If data was a vector, you wouldn't have to worry about it.
    • You might as well use transform for your toLower function.
    • I'd have load take a string. Then just call c_str() from the string. Your other functions use string, so it's best to keep a consistent interface.
    • Your max function is oddly named. It is meant to sort pairs by the second value. Perhaps SortBySecond or something like that would make more sense?

  3. #3
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    using namespace std;

    NEVER do this in a header file!
    Otherwise you force everyone who uses your header file to import the entire std namespace whether they want to or not.

    If you want to use a using statement, keep them in .cpp files AFTER all your #include statements.

    Also, I don't know if you left out comments to shorten the post, but I think it's a good idea to always document your functions thoroughly so other people reading it know how/what it's supposed to do, and so you'll know 6 months from now when you look at your code again and scratch your head.
    Here's an example of one of my functions where I use Doxygen style comments:

    Code:
    	/** This assignment operator copies the pointer held by rhs to this object, and also
    	 *  transfers ownership from rhs to this object by setting rhs to NULL.
    	 *
    	 *  Any existing pointer held by this object will be deleted before assigning the rhs
    	 *  pointer to this object.
    	 *
    	 *  @param rhs [IN] - This is a reference to another AutoPtr object which will be
    	 *			copied into this object.  rhs will be set to NULL.
    	 *
    	 *  @return AutoPtr& - It will return a reference to this object (i.e.  *this).
    	 */
    	AutoPtr<T_Type, T_Deletor>& operator=( AutoPtr<T_Type, T_Deletor>&  rhs )	// throw()
    	{
    		// Make sure we aren't assigning to ourself.
    		if ( this != &rhs )
    		{
    			// If the pointers aren't identical and our pointer isn't NULL, delete our pointer.
    			if ( m_Ptr != rhs.m_Ptr )
    			{
    				if ( m_Ptr != NULL )
    				{
    					m_Deletor( m_Ptr );
    				}
    			}
    
    			// Copy the pointer and release rhs from ownership of the pointer.
    			m_Ptr = rhs.release();
    		}
    
    		return *this;
    	}
    Last edited by cpjust; 12-11-2007 at 06:23 PM. Reason: Add comment about comments.

  4. #4
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by cpjust View Post
    If you want to use a using statement, keep them in .cpp files AFTER all your #include statements.
    Because most code is contained within functions, you can do a "using namespace" inside function scope to make life easier without affecting users of the header file. Of course, you'd only have function code in a header if you were writing inline or template functions, but that's pretty common. For instance:

    Code:
    // File my_header.hh
    
    using namespace std; // BAD
    
    inline void some_inline_func()
    {
        using namespace std; // OKAY
    
        code_that_uses_stuff_from_std;
    }

  5. #5
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Method bodies, particularly such large ones belong in a separate C++ file. So should non inline functions like max. In this case you can alternately just put the key word inline before the function.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  6. #6
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    You misspelled length as lenght.

    Don't make your load function hard-coded to return true. It should either be able to return something different when it fails or it should throw an exception, in which case there's no point in returning anything upon success, as the lack of an exception being thrown says that it succeeded.

    Simply checking spelling only involves determining if a word exists in the dictionary. (A trivial task once you've loaded the file in to a std::set).
    You seem to be attempting to do spelling suggesting as well. However you've used a rather simplistic technique that will only catch pretty basic errors. Real spelling suggestors are much more advanced. They would also rate certain kinds of error as more likely and hence can provide more likely suggestions. They likely work very differently too.
    You should probably read up on Minimum Edit Distance.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  7. #7
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    Simply checking spelling only involves determining if a word exists in the dictionary. (A trivial task once you've loaded the file in to a std::set).
    You seem to be attempting to do spelling suggesting as well. However you've used a rather simplistic technique that will only catch pretty basic errors. Real spelling suggestors are much more advanced. They would also rate certain kinds of error as more likely and hence can provide more likely suggestions. They likely work very differently too.
    You should probably read up on Minimum Edit Distance.
    My objective in the beggining is just to write a simple, small and fast spell corrector, but in the next days i will try to improve it, your reading will be very usefull to me, thank you.

    Ok, have reviewed all suggestions and I thank you all, heres the result so far:

    SpellCorrector.h
    Code:
    #ifndef _SPELLCORRECTOR_H_
    #define _SPELLCORRECTOR_H_
    
    #include <iostream>
    #include <map>
    #include <fstream>
    #include <sstream>
    #include <algorithm>
    #include <vector>
    
    typedef std::map<std::string, int> Dictionary;
    typedef std::pair<std::string, int> Pair;
    typedef std::vector<std::string> Vector;
    
    class SpellCorrector
    {
    private:
    	static const char alphabet[];
    	Dictionary dictionary;
    
    	void edits(std::string& word, Vector& result);
    	void known(Vector& results, Dictionary& candidates);
    
    public:
    	void load(std::string filename);
    	std::string correct(std::string word);
    };
    
    #endif
    SpellCorrector.cpp
    Code:
    #include "SpellCorrector.h"
    
    using namespace std;
    
    const char SpellCorrector::alphabet[] = "abcdefghijklmnopqrstuvwxyz";
    
    bool sortBySecond(const Pair& left, const Pair& right)
    {
    	return left.second < right.second;
    }
    
    void SpellCorrector::load(string filename)
    {
    	ifstream file(filename.c_str());
    	char* data;
    	
    	file.seekg(0, ios_base::end);
    	int length = file.tellg();
    	file.seekg(0, ios_base::beg);
    
    	data = new char[length+1];
    
    	file.read(data, length);
    
    	string line(data);
    
    	delete [] data;
    	
    	transform(line.begin(), line.end(), line.begin(), tolower);
    
    	string::size_type position = 0;
    	while ((position = line.find_first_of(alphabet, position)) != string::npos)
    	{
    		string::size_type endPosition = line.find_first_not_of(alphabet, position);
    
    		dictionary[line.substr(position, endPosition - position)]++;
    
    		position = endPosition;
    	}
    }
    
    string SpellCorrector::correct(string word)
    {
    	Vector result;
    	Dictionary candidates;
    
    	if (dictionary.find(word) != dictionary.end()) { return word; }
    
    	edits(word, result);
    	known(result, candidates);
    
    	if (candidates.size() > 0) { return max_element(candidates.begin(), candidates.end(), sortBySecond)->first; }
    
    	for (unsigned int i = 0;i < result.size();i++)
    	{
    		Vector subResult;
    		edits(result[i], subResult);
    		known(subResult, candidates);
    	}
    
    	if (candidates.size() > 0) { return max_element(candidates.begin(), candidates.end(), sortBySecond)->first; }
    
    	return "";
    }
    
    void SpellCorrector::known(Vector& results, Dictionary& candidates)
    {
    	Dictionary::iterator end = dictionary.end();
    	for (unsigned int i = 0;i < results.size();i++)
    	{
    		Dictionary::iterator value = dictionary.find(results[i]);
    		
    		if (value != end) candidates[value->first] = value->second;
    	}
    }
    
    void SpellCorrector::edits(string& word, Vector& result)
    {
    	for (string::size_type i = 0;i < word.size();     i++) result.push_back(word.substr(0, i)                  + word.substr(i + 1)); //deletions
    	for (string::size_type i = 0;i < word.size() - 1;i++) result.push_back(word.substr(0, i) + word[i+1] + word.substr(i + 2)); //transposition
    
    	for (char j = 'a';j <= 'z';++j)
    	{
    		for (string::size_type i = 0;i < word.size();     i++) result.push_back(word.substr(0, i) + j + word.substr(i + 1)); //alterations
    		for (string::size_type i = 0;i < word.size() + 1;i++) result.push_back(word.substr(0, i) + j + word.substr(i)     ); //insertion
    	}
    }
    By the way, why should i implement the class functions in the separate .cpp? I have always done this but with no good reason to do that, and i think that header style implementations are easier to redistribute a library just like STL and some boost libraries do... More suggestions would be great.
    Last edited by Scarvenger; 12-12-2007 at 08:14 AM.

  8. #8
    The larch
    Join Date
    May 2006
    Posts
    3,573
    By the way, why should i implement the class functions in the separate .cpp? I have always done this but with no good reason to do that, and i think that header style implementations are easier to redistribute a library just like STL and some boost libraries do...
    Unless the class is templated, implementing it in the header also makes compiling times longer, as well as throws code in that the user shouldn't see. With templates the class must be implemented in the header as virtually no compiler supports export keyword that would make the separation possible.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  9. #9
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Now you should also move some of the #include directives from your header file to your .cpp file. Try to #include as little as possible in header files -- basically you would only need <string> <map> & <vector>. Everything else is only used in your .cpp file.

  10. #10
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    It, for one thing, makes it a mess to traverse the header to look at all the functions and members a class contains.
    Last edited by Elysia; 12-13-2007 at 01:42 AM.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  11. #11
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by Elysia View Post
    It also makes it a mess to traverse the header to look at all the functions and members a class contains.
    Yeah, that's one of the reasons I like C++ over Java. But then there's so many C++ things that I wish Java had...

  12. #12
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    Quote Originally Posted by cpjust View Post
    Yeah, that's one of the reasons I like C++ over Java. But then there's so many C++ things that I wish Java had...
    You might like C# then.

    And by the way, using good programming practices you don't usually end up with that problem in Java or C#, because in order to keep classes loosely coupled (which should always be a major goal, especially if this is a class designed for other people to use), it should be implementing an interface. All you need to provide to the user of your class is your interface; how your class or classes implement it is irrelevant. This allows bare minimal coupling between your classes and theirs.
    You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.

  13. #13
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    After some time spended i could achieve the following code:
    Code:
    void SpellCorrector::load(const std::string& filename)
    {
    	ifstream file(filename.c_str(), ios_base::binary | ios_base::in);
    
    	boost::timer t1;
    
    	file.seekg(0, ios_base::end);
    	int length = file.tellg();
    	file.seekg(0, ios_base::beg);
    
    	string line(length, '0');
    
    	file.read(&line[0], length);
    
    	cout << "t1:" << t1.elapsed() << "s \n";
    
    	boost::timer t2;
    
    	transform(line.begin(), line.end(), line.begin(), tolower);
    
    	cout << "t2:" << t2.elapsed() << "s \n";
    
    	boost::timer t3;
    
    	string::size_type position = 0;
    	while ((position = line.find_first_of(alphabet, position)) != string::npos)
    	{
    		string::size_type endPosition = line.find_first_not_of(alphabet, position);
    
    		dictionary[line.substr(position, endPosition - position)]++;
    
    		position = endPosition;
    	}
    	cout << "t3:" << t3.elapsed() << "s \n";
    }
    with these marks:
    t1:0.078s
    t2:0.109s
    t3:2.109s

    I just wanted to optimize the sring parsing now and i would be happy... I will keep searching...
    Last edited by Scarvenger; 12-14-2007 at 06:12 AM.

  14. #14
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    The compiler doesn't automatically inline functions that are in header files, but you have to put inline functions in header files, for the compiler to do the replacement.

  15. #15
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Also, from what I understand, even if you tell the compiler to inline a function, the compiler is free to do whatever it thinks is best. So inline is really more of a suggestion to the compiler.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Software Design/Test - Redmond, WA
    By IRVolt in forum Projects and Job Recruitment
    Replies: 2
    Last Post: 06-11-2008, 10:26 AM
  2. Why C Matters
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 136
    Last Post: 01-16-2008, 09:09 AM
  3. Adding trial period to software
    By BobS0327 in forum C Programming
    Replies: 17
    Last Post: 01-03-2006, 02:13 PM
  4. Optimization settings
    By Roaring_Tiger in forum C Programming
    Replies: 4
    Last Post: 02-23-2005, 02:53 AM