I've downloaded the lastest wikipedia dump and I'm wanting to do some analysis on it. The problem is it is nearly 5 gigs so I'm not sure how to do stuff with it. (It's an XML file) The most basic feature I'm interested in would be a function that creates a list with each page's character count. From there I would be able to implement the more advanced stuff.