I sat in on a masters thesis seminar that inspired this idea. The problem is to create a summary of a given document. the trade offs are accuracy for readability.
Write a program that will create a 50 word summary of a 300 word document. summaries should be judged in 2 catagories, Accuracy and Human Readability.
1) take the top 50 words based on frequency count. accuracy rating will be high, human reability will be low.
2) take the first 'n' sentances up to 50 words. Human Readability will be high, however many important points in the document will be left out resulting in a low accuracy score.
The objective is to find a balance somewhere. Creating a summary that encompases the most information while still being human readable.
I thought this might be a good contest idea as anyone that can read a file into a program can participate. Newbs can use simple sentance or word selection algorithms while more advanced programmers can dip into areas of NLP (natural language processing) or anything else they can think of.
Just a thought...