PDA

View Full Version : IDEA: Spelling Checker



fletch
08-17-2002, 08:54 AM
I think this came up before, but...

How about a program that checks the spelling of a single word or an entire text file? Entries could be graded on speed, accuracy, how it handles upper and lower case letters, etc. Probably need some sort of 'standard' dictionary (I guess I'll make one if this idea ever comes up).

ygfperson
08-17-2002, 11:15 AM
a dictionary would really help matters...
but i don't know of any free ones

fletch
08-17-2002, 12:11 PM
Take a look at Project Gutenberg (http://promo.net/pg/). A copy of Webster's Unabridged Dictionary can be found at this FTP site (ftp://ibiblio.org/pub/docs/books/gutenberg/etext96/). Look for the files named pgwxxxx.xxx. The files have tags for formatting, but I'd be willing to go through and extract the individual words. The only problem I see would be the huge number of words...or perhaps the dictionary could be divided up and it would be up to the contestant to find a way to use the dictionary effectively. On the other hand, maybe a homemade smaller dictionary (couple 100 entries) of commonly used words would be better. A smaller dictionary would be less daunting and maybe attract more contestants...just a thought.

ygfperson
08-17-2002, 05:25 PM
i've written a small c++ program that extracts the text from a file and prints out only the word, without non-alphabet letters. i've only tested this on the 'c' part of the dictionary, but it should work for all of it.

ygfperson
08-17-2002, 05:42 PM
forgot to mention that you have to change the input filename depending on the file.

fletch
08-17-2002, 05:55 PM
Yeah, I figured that out :)

Unfortunately, I lack your programming savvy. I was trying to work something out along the same lines as you (except using the standard C string functions - not STL) and not really getting very far very fast. Now I'm using your dict-list.cpp as a crash course in the STL.

Hammer
08-17-2002, 06:46 PM
I was trying to work something out along the same lines as you (except using the standard C string functions - not STL) and not really getting very far very fast.
For purely educational purposes, here's a C version that will give the same end result as ygfs C++ one. (ygf, feel free to delete this if you think it's roaming OT).

fletch
08-17-2002, 07:24 PM
I now have a 1.18 MB text file dictionary containing 118,964 words. Still needs a little work though - I need to delete duplicate and multiple word entries.

Thanks for the educational opportunity ygf and Hammer!

ygfperson
08-17-2002, 09:42 PM
Originally posted by Hammer

For purely educational purposes, here's a C version that will give the same end result as ygfs C++ one. (ygf, feel free to delete this if you think it's roaming OT).
no... that wouldn't be fair. ;)

just letting everyone know who happens to be reading this...
anything related to the topic can be posted inside the IDEA thread. you can also post for the sole purpose of bumping the thread to the top.

fletch
08-17-2002, 09:46 PM
ygf,

I just emailed you a dictionary in case this contest ever comes up. It was too big to post here...

BTW, what's a modamater?

ygfperson
08-17-2002, 09:57 PM
Originally posted by fletch

BTW, what's a modamater?
i dunno
what's the matta wit you? *groans* :D

fletch
08-17-2002, 10:02 PM
Originally posted by ygfperson
what's the matta wit you? *groans* :D nuttin'

ygfperson
08-17-2002, 10:56 PM
i checked your dictionary file, and i think there are some words that are out of order, like Zymosimeter and Zymophyte. i'm assuming that when the words are put into correct order, checking for duplicates should get as easy as checking word x with word x+1 for equality. i'll write up a small program to correct these inaccuracies

fletch
08-17-2002, 11:17 PM
Originally posted by ygfperson
i checked your dictionary file, and i think there are some words that are out of order, like Zymosimeter and Zymophyte. i'm assuming that when the words are put into correct order, checking for duplicates should get as easy as checking word x with word x+1 for equality. That's the order that they came out of the dictionary in :D
That's what I did to get rid of the other duplicate entries. Of course that was based on the assumption that the dictionary was in alphabetical order. (Not so) Obviously that was an erroneous assumption. Oh, well...you know what they say about assume.

ygfperson
08-17-2002, 11:20 PM
yeah, well... :D

everybody makes mistakes, except my 1ghz athlon amd

i loaded every word into memory with the attached program, sorted them, then printed them back out in the correct order. god, i love the stl... imaging trying to write this kind of stuff from scratch.