* Морфология и компьютерная лингвистика для самых маленьких
* Роль морфологии в компьютерной лингвистике
* Морфология. Задачи и подходы к их решению
* Псевдолемматизация, композиты и прочие странные словечки
NGramJ is a Java based library containing two types of ngram based applications. It's major focus is to provide robust and state of the art language recognition.
Online Demo of the TreeTagger. A tool for annotating text with part-of-speech and lemma information which has been developed at the Institute for Computational Linguistics of the University of Stuttgart.
Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
SentiWordNet is a lexical resource for opinion mining. SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity.
Alle Programme und Resourcen auf der Liste sind frei, d.h. kostenlos (für Forschungszwecke) verfügbar, auf deutschsprachige Texte anwendbar und sofort startklar, d.h. sie müssen nicht erst mit Hilfe von z.B. annotierten Korpora trainiert werden. Die Liste ist natürlich unvollständig (Stand 22.5.2007).
TIGER API is a library which allows Java programmers to easily access the structure of any corpus given as a TIGER-XML file. It can process the TIGER corpus and any other corpus encoded in TIGER-XML. The underlying API specifies a Java object model for corpora encoded in TIGER-XML and provides methods for traversing syntax trees and accessing elements such as sentences, syntax graph nodes, and their attributes.
Speech technology potentially allows everyone to participate in today's information revolution and can bridge the language barrier gap. Unfortunately, construction of speech processing systems requires significant resources. With some 6900 languages in the world, traditionally speech processing is prohibitive to all but the most economically viable languages. In spite of recent improvements in speech processing, supporting new languages is a skilled job requiring significant effort from trained individuals. SPICE aims to overcome both limitations by providing an interactive language creation and evaluation toolkit that allows everyone to develop speech processing models, to collect appropriate data for model building, and to evaluate the results enabling iterative improvements.
OpenNLP is an organizational center for open source projects related to natural language processing. It hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package.
The objective of the ACE Program is to develop extraction technology to support automatic processing of source language data (in the form of natural text, and as text derived from ASR and OCR). This includes classification, filtering, and selection based on the language content of the source data, i.e., based on the meaning conveyed by the data. Thus the ACE program requires the development of technologies that automatically detect and characterize this meaning. The ACE research objectives are viewed as the detection and characterization of Entities, Relations, and Events.
This is an overview of the open source NLP and machine learning tools for text mining, information extraction, text classification, clustering, approximate string matching, language parsing and tagging, and more.
Platform for Annotated Corpora in XML Integrated tool for corpus linguists built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, querying, managing version controlled data, and building a shippable corpus.