Source code to repeat the paper evaluation: We present the first unsupervised approach to the problem of learning a semantic parser, using Markov logic. Our USP system transforms dependency trees into quasi-logical forms, recursively induces lambda forms from these, and clusters them to abstract away syntactic variations of the same meaning. The MAP semantic parse of a sentence is obtained by recursively assigning its parts to lambda-form clusters and composing them. We evaluate our approach by using it to extract a knowledge base from biomedical abstracts and answer questions. USP substantially outperforms TextRunner, DIRT and an informed baseline on both precision and recall on this task.
Die Tübinger Baumbank des Deutschen / Schriftsprache (TüBa-D/Z) ist ein syntaktisch annotiertes Korpus auf der Grundlage der Zeitung "die tageszeitung" (taz). Sie umfasst zur Zeit ca. 36 000 Sätze bzw. 630 000 Worte.
Online Demo of the TreeTagger. A tool for annotating text with part-of-speech and lemma information which has been developed at the Institute for Computational Linguistics of the University of Stuttgart.
The TreeTagger is a tool for annotating text with part-of-speech and lemma information which has been developed within the TC project at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, Chinese and old French texts and is easily adaptable to other languages if a lexicon and a manually tagged training corpus are available.
TIGER API is a library which allows Java programmers to easily access the structure of any corpus given as a TIGER-XML file. It can process the TIGER corpus and any other corpus encoded in TIGER-XML. The underlying API specifies a Java object model for corpora encoded in TIGER-XML and provides methods for traversing syntax trees and accessing elements such as sentences, syntax graph nodes, and their attributes.
OpenNLP is an organizational center for open source projects related to natural language processing. It hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package.
Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general, and with a certain amount of adaption, Shalmaneser should be usable for other paradigms (e.g., PropBank roles) as well. Shalmaneser caters both for end users, and for researchers.
SentiWordNet is a lexical resource for opinion mining. SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity.
This is the project page for SecondString, an open-source Java-based package of approximate string-matching techniques. This code was developed by researchers at Carnegie Mellon University from the Center for Automated Learning and Discovery, the Department of Statistics, and the Center for Computer and Communications Security.
OpenCyc is the open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine.
Das NEGRA Korpus Version 2 besteht aus 355.096 Tokens (20.602 Sätzen) deutschen Zeitungstextes aus der Frankfurter Rundschau. Die Texte sind der CD "Multilingual Corpus 1" der European Corpus Initiative entnommen. Es basiert auf ca. 60.000 Tokens, die am Institut für maschinelle Sprachverarbeitung, Stuttgart, mit Parts-of-Speech annotiert wurden. Dieses Korpus wurde erweitert, ebenfalls mit Parts-of-Speech versehen und vollständig mit syntaktischen Strukturen annotiert. Der Aufbau des Korpus wurde in den Projekten NEGRA (DFG Sonderforschungsbereich 378, Projekt C3) und LINC (Universität des Saarlandes) in Saarbrücken durchgeführt.
E. Riloff. Connectionist, statistical, and symbolic approaches to learning for natural language processing, 1040, page 275--289. Heidelberg, DE, Springer Verlag, (1996)
M. Gamon. COLING '04: Proceedings of the 20th international conference on Computational Linguistics, page 841. Morristown, NJ, USA, Association for Computational Linguistics, (2004)
M. Collins, and N. Duffy. ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, page 263--270. Morristown, NJ, USA, Association for Computational Linguistics, (2002)
R. Grishman, and B. Sundheim. Proceedings of the 16th International Conference on Computational Linguistics (COLING), page 466--471. Kopenhagen, (1996)