Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME™ libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft® Word/PowerPoint/Excel and just about any file that can be converted to XML or HTML text. Swish-e is also often used to supplement databases like the MySQL® DBMS for very fast full-text searching. Check out the full list of features.
Y. Yanbe, A. Jatowt, S. Nakamura, and K. Tanaka. JCDL '07: Proceedings of the 2007 conference on Digital libraries, page 107--116. New York, NY, USA, ACM Press, (2007)
A. Hotho, R. J�schke, C. Schmitz, and G. Stumme. The Semantic Web: Research and Applications, volume 4011 of LNAI, page 411-426. Heidelberg, Springer, (June 2006)
A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. The Semantic Web: Research and Applications, volume 4011 of Lecture Notes in Computer Science, page 411-426. Heidelberg, Springer, (June 2006)
R. Jäschke, B. Krause, A. Hotho, and G. Stumme. Proceedings of the Second International Conference on Weblogs and Social Media(ICWSM 2008), AAAI Press, (2008)