Source code to repeat the paper evaluation: We present the first unsupervised approach to the problem of learning a semantic parser, using Markov logic. Our USP system transforms dependency trees into quasi-logical forms, recursively induces lambda forms from these, and clusters them to abstract away syntactic variations of the same meaning. The MAP semantic parse of a sentence is obtained by recursively assigning its parts to lambda-form clusters and composing them. We evaluate our approach by using it to extract a knowledge base from biomedical abstracts and answer questions. USP substantially outperforms TextRunner, DIRT and an informed baseline on both precision and recall on this task.
ConceptNet represents data in the form of a semantic network, and makes it available to be used in natural language processing and intelligent user interfaces.
This is the project page for SecondString, an open-source Java-based package of approximate string-matching techniques. This code was developed by researchers at Carnegie Mellon University from the Center for Automated Learning and Discovery, the Department of Statistics, and the Center for Computer and Communications Security.
This is an overview of the open source NLP and machine learning tools for text mining, information extraction, text classification, clustering, approximate string matching, language parsing and tagging, and more.
Extensible Dependency Grammar (XDG) is a general framework for dependency grammar, with multiple levels of linguistic representations called dimensions, e.g. grammatical function, word order, predicate-argument structure, scope structure, information structure and prosodic structure. It is articulated around a graph description language for multi-dimensional attributed labeled graphs.
An XDG grammar is a constraint that describes the valid linguistic signs as n-dimensional attributed labeled graphs, i.e. n-tuples of graphs sharing the same set of attributed nodes, but having different sets of labeled edges. All aspects of these signs are stipulated explicitly by principles: the class of models for each dimension, additional properties that they must satisfy, how one dimension must relate to another, and even lexicalization.
R. Grishman, and B. Sundheim. Proceedings of the 16th International Conference on Computational Linguistics (COLING), page 466--471. Kopenhagen, (1996)
K. Toutanova, and C. Manning. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), page 63--70. (2000)
D. Roth, and W. tau Yih. HLT-NAACL 2004 Workshop: Eighth Conference on Computational Natural Language Learning (CoNLL-2004), page 1--8. Boston, Massachusetts, USA, Association for Computational Linguistics, (May 2004)