a suite of open source Python modules, data and documentation for research and development in natural language processing. NLTK contains Code supporting dozens of NLP tasks, along with 40 popular Corpora and extensive Documentation including a 375-page online Book. Distributions for Windows, Mac OSX and Linux are available.
The Skeptic's Dictionary began with fewer than 50 entries in 1994 and has grown to more than 500 entries in 2007. Each month, the site gets about a million visitors and processes about 1.5 million page views. Thousands of readers have e-mailed comments, suggestions, and criticisms. In response, I've added many new entries since the first "printing." Many reader comments have been posted. Thanks to alert readers, numerous errors have been corrected.
great open source software with various functionality for text and NLP support. Has components for rule and dictionary based extraction, co-reference analysis.
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. This software is a Java implementation of the log-linear part-of-speech taggers...
ConceptNet is a freely available commonsense knowledgebase and natural-language-processing toolkit which supports many practical textual-reasoning tasks over real-world documents right out-of-the-box (without additional statistical training) including
The goal of the PDTB project is to develop a large scale corpus annotated with information related to discourse structure. While there are many aspects of discourse that are crucial to a complete understanding of natural language, the Penn Discourse Treebank (PDTB) focuses on encoding coherence relations associated with discourse connectives. The annotations include the argument structure of the connectives, thus exposing a clearly defined level of discourse structure which will support the extraction of a range of inferences associated with discourse connectives. Some other annotated features associated with discourse connectives and their arguments include sense distinctions for discourse connectives, and attribution-related features for both connectives and their arguments.
Die Tübinger Baumbank des Deutschen / Schriftsprache (TüBa-D/Z) ist ein syntaktisch annotiertes Korpus auf der Grundlage der Zeitung "die tageszeitung" (taz). Sie umfasst zur Zeit ca. 36 000 Sätze bzw. 630 000 Worte.
TIGER API is a library which allows Java programmers to easily access the structure of any corpus given as a TIGER-XML file. It can process the TIGER corpus and any other corpus encoded in TIGER-XML. The underlying API specifies a Java object model for corpora encoded in TIGER-XML and provides methods for traversing syntax trees and accessing elements such as sentences, syntax graph nodes, and their attributes.
Das NEGRA Korpus Version 2 besteht aus 355.096 Tokens (20.602 Sätzen) deutschen Zeitungstextes aus der Frankfurter Rundschau. Die Texte sind der CD "Multilingual Corpus 1" der European Corpus Initiative entnommen. Es basiert auf ca. 60.000 Tokens, die am Institut für maschinelle Sprachverarbeitung, Stuttgart, mit Parts-of-Speech annotiert wurden. Dieses Korpus wurde erweitert, ebenfalls mit Parts-of-Speech versehen und vollständig mit syntaktischen Strukturen annotiert. Der Aufbau des Korpus wurde in den Projekten NEGRA (DFG Sonderforschungsbereich 378, Projekt C3) und LINC (Universität des Saarlandes) in Saarbrücken durchgeführt.
H. Chang, und A. McCallum. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Seite 8048--8073. Dublin, Ireland, Association for Computational Linguistics, (Mai 2022)
T. Ziegenbein, S. Syed, F. Lange, M. Potthast, und H. Wachsmuth. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Seite 4344--4363. Association for Computational Linguistics (ACL), (Juli 2023)Funding Information: This project has been partially funded by the German Research Foundation (DFG) within the project OASiS, project number 455913891, as part of the Priority Program “Robust Argumentation Machines (RATIO)” (SPP-1999). We would like to thank the participants of our study and the anonymous reviewers for the feedback and their time.; 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023.
S. Syed, T. Ziegenbein, P. Heinisch, H. Wachsmuth, und M. Potthast. Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Seite 114--129. Prague, Czechia, Association for Computational Linguistics, (September 2023)
M. Stahl, und H. Wachsmuth. Proceedings of the 16th International Natural Language Generation Conference: Generation Challenges, Seite 31--36. (September 2023)
G. Skitalinskaya, und H. Wachsmuth. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Seite 15799–15816. Association for Computational Linguistics (ACL), (Juli 2023)Funding Information: We thank Andreas Breiter for his valuable feedback on early drafts, and the anonymous reviewers for their helpful comments. This work was partially funded by the Deutsche Forschungsgemeinschaft(DFG, German Research Foundation) under project number 374666841, SFB 1342.; 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023.
G. Skitalinskaya, M. Spliethöver, und H. Wachsmuth. Proceedings of the 16th International Natural Language Generation Conference, Seite 134--152. (2023)DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions..
M. Sengupta. Findings of the Association for Computational Linguistics: EMNLP 2023, Seite 4636–4659. Association for Computational Linguistics (ACL), (Dezember 2023)
Z. Nouri, N. Prakash, U. Gadiraju, und H. Wachsmuth. IUI 2023 - Proceedings of the 28th International Conference on Intelligent User Interfaces, Seite 737–749. United States, Association for Computing Machinery (ACM), (27.03.2023)
G. Lapesa, {. Vecchi, S. Villata, und H. Wachsmuth. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), (Mai 2023)Funding Information: Gabriella Lapesa and Eva Maria Vecchi are funded by the Bundesministerium für Bildung und Forschung (BMBF), project E-DELIB (Powering up E-deliberation: towards AI-supported moderation). Serena Villata is supported by the French government, through the 3IA Côte d’Azur Investments in the Future project managed by the ANR with the reference number ANR-19-P3IA-0002.; 17th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2023 ; Conference date: 02-05-2023 Through 04-05-2023.