Webstemmer is a web crawler and HTML layout analyzer that automatically extracts main text of a news site without having banners, ads and/or navigation links mixed up
F. Suchanek, G. Ifrim, and G. Weikum. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), page 712--717. New York, NY, USA, ACM, (2006)
D. Downey, M. Broadhead, and O. Etzioni. Proc. of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI'07), Hyderabad, India, (January 2007)