Wmatrix is a software tool for corpus analysis and comparison. It provides a web interface to the USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains.
SourceForge presents the Xaira project. Xaira is an open source application. SourceForge provides the world's largest selection of Open Source Software. XAIRA (XML Aware Indexing and Retrieval Architecture) supports indexing and analysis of large XML textual resources such as natural language corpora.
The Stanford WebBase project has been collecting topic focused snapshots of Web sites. All the resulting archives are available to the public via fast download streams. For example, we collected pages from 350 sites every day for several weeks after the Katrina hurricane disaster. We also collect pages from government Web sites on a regular basis.
J. Wermter, and U. Hahn. 44th Annual Meeting of the Association for Computational Linguistics, page 785--792. Sydney, Australia, Association for Computational Linguistics, (July 2006)
R. Sauer, J. Maurinsh, U. Reith, F. Fulle, K. Klotz, and C. Muller. J Med Chem, 43 (3):
440-8(February 2000)Sauer, R Maurinsh, J Reith, U Fulle, F Klotz, K N Muller, C E In
Vitro Research Support, Non-U.S. Gov't United states Journal of medicinal
chemistry J Med Chem. 2000 Feb 10;43(3):440-8..