Lately I’ve been working on evaluating and comparing algorithms, capable of extracting useful content from arbitrary html documents. I have made a feature wise comparison of related software and APIs.
M. Wang. Proceedings of the Third International Joint Conference on Natural Language Processing, 2, Seite 841--846. Hyderabad, India, Asian Federation of Natural Language Processing, Association for Computational Linguistics, (Januar 2008)