I am going to write a Demo to extract keyword and category of a web page, does anyone know any open sources/samples/documents/book for me to start with?

BTW: I need to deal with some multi-language pages, like Japanese, which I know nothing about. It is good if you could recommend me some stuff which could deal with multi-language problem. Thanks.