Webstemmer is a web crawler and HTML layout analyzer that automatically extracts main text of a news site without having banners, ads and/or navigation links mixed up
J. Pasternack, und D. Roth. WWW '09: Proceedings of the 18th international conference on World wide web, Seite 971--980. New York, NY, USA, ACM, (2009)