The Javatools are a collection of Java classes for a variety of small tasks, such as parsing, database interaction or file handling. They were developed by Fabian M. Suchanek for the YAGO-NAGA project. The Javatools are licensed under a Creative Commons Attribution 3.0 License by the YAGO-NAGA team.
Webstemmer is a web crawler and HTML layout analyzer that automatically extracts main text of a news site without having banners, ads and/or navigation links mixed up
Provides free and open access of a high quality video lectures presented by distinguished scholars and scientists at the most important and prominent events like conferences, summer schools, workshops and science promotional events from many fields of Science.
Source code to repeat the paper evaluation: We present the first unsupervised approach to the problem of learning a semantic parser, using Markov logic. Our USP system transforms dependency trees into quasi-logical forms, recursively induces lambda forms from these, and clusters them to abstract away syntactic variations of the same meaning. The MAP semantic parse of a sentence is obtained by recursively assigning its parts to lambda-form clusters and composing them. We evaluate our approach by using it to extract a knowledge base from biomedical abstracts and answer questions. USP substantially outperforms TextRunner, DIRT and an informed baseline on both precision and recall on this task.
Die Tübinger Baumbank des Deutschen / Schriftsprache (TüBa-D/Z) ist ein syntaktisch annotiertes Korpus auf der Grundlage der Zeitung "die tageszeitung" (taz). Sie umfasst zur Zeit ca. 36 000 Sätze bzw. 630 000 Worte.
Online Demo of the TreeTagger. A tool for annotating text with part-of-speech and lemma information which has been developed at the Institute for Computational Linguistics of the University of Stuttgart.
The TreeTagger is a tool for annotating text with part-of-speech and lemma information which has been developed within the TC project at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, Chinese and old French texts and is easily adaptable to other languages if a lexicon and a manually tagged training corpus are available.
T-Rex (Trainable Relation Extraction) is a highly configurable machine learning-based Information Extraction from Text framework, which includes tools for document classification, entity extraction and relation extraction.
Markup Language for Temporal and Event Expressions - TimeML is a robust specification language for events and temporal expressions in natural language.
C. Robertson, S. Geva, and R. Wolff. AusDM '06: Proceedings of the fifth Australasian conference on Data mining and analystics, page 145--153. Darlinghurst, Australia, Australia, Australian Computer Society, Inc., (2006)
E. Riloff. Connectionist, statistical, and symbolic approaches to learning for natural language processing, 1040, page 275--289. Heidelberg, DE, Springer Verlag, (1996)
N. Chambers, and D. Jurafsky. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, page 602--610. Suntec, Singapore, Association for Computational Linguistics, (August 2009)
L. Qian, G. Zhou, F. Kong, Q. Zhu, and P. Qian. ALPIT '08: Proceedings of the 2008 International Conference on Advanced Language Processing and Web Information Technology, (2008)
G. Zhou, M. Zhang, D. Ji, and Q. Zhu. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), page 728--736. (2007)