This is the project page for SecondString, an open-source Java-based package of approximate string-matching techniques. This code was developed by researchers at Carnegie Mellon University from the Center for Automated Learning and Discovery, the Department of Statistics, and the Center for Computer and Communications Security.
OpenNLP is an organizational center for open source projects related to natural language processing. It hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package.
ASV Toolbox is a modular collection of tools for the exploration of written language data. They work either on word lists or text and solve several linguistic classification and clustering tasks. The topics covered contain language detection, POS-tagging, base form reduction, named entity recognition, and terminology extraction.
TIGER API is a library which allows Java programmers to easily access the structure of any corpus given as a TIGER-XML file. It can process the TIGER corpus and any other corpus encoded in TIGER-XML. The underlying API specifies a Java object model for corpora encoded in TIGER-XML and provides methods for traversing syntax trees and accessing elements such as sentences, syntax graph nodes, and their attributes.