Finding important information in unstructured text
From Language and Information Technologies
Jump to: navigation, search
A vast majority of the information we deal with in everyday life consists of raw, unstructured text, where the most important facts or concepts are not always readily available, but hidden in the myriad of details that accompany them. To handle and digest the sheer amount of information we are exposed to in this information age, more sophisticated procedures are required to unveil the important parts of a text, and to allow us to process more information in less time. The goal of this project is to develop robust and accurate techniques to automatically extract important information from unstructured text, in the form of keyphrases (keyphrase extraction) or entire sentences (extractive summarization).
Funded by Google
[edit]
I am investigating computational models for linguistic structures and processes, with application to language technologies and to the documentation of endangered languages. My current focus is on efficient query for databases of hierarchically annotated data. After completing a PhD on computational phonology at the University of Edinburgh in 1990, I worked on a series of European research projects and conducted linguistic fieldwork in Cameroon with SIL. In 1998 I moved to the University of Pennsylvania, becoming Associate Director of the LDC, and working on models and tools for linguistic annotation. In 2002 I returned home to Australia and established the Melbourne University Language Technology Group. In 2007 I was awarded the Kelvin Medal for excellence in teaching.
Key Activities: Coordinating first year Informatics; developing the Natural Language Toolkit; writing a textbook on NLP; leading the Language Technology Group; working on an NSF project on Querying Linguistic Databases; and editing Cambridge Studies in Natural Language Processing and the ACL Anthology.
Key Publications: Natural Language Processing in Python; Computational phonology: A constraint-based approach (Cambridge); A formal framework for linguistic annotation (Speech Communication); Seven dimensions of portability for language documentation and description (Language); Designing and evaluating an XPath dialect for linguistic queries (ICDE).
This relates to the recent Slashdot-posted paper about the world being a VR. If indeed human mind is non-computable, the world can't be VR. Cf. On Intelligence.
Der Inkunabelkatalog der UB Augsburg verzeichnet die ca. 1100 Inkunabeln der Oettingen-Wallersteinschen Bibliothek. Recherche des provenances possible.
H. Chang, Z. Yao, A. Gon, H. Yu, and A. McCallum. Findings of the Association for Computational Linguistics: ACL 2023, page 12707--12730. Toronto, Canada, Association for Computational Linguistics, (July 2023)
H. Chang, and A. McCallum. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), page 8048--8073. Dublin, Ireland, Association for Computational Linguistics, (May 2022)
T. Ziegenbein, S. Syed, F. Lange, M. Potthast, and H. Wachsmuth. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, page 4344--4363. Association for Computational Linguistics (ACL), (July 2023)Funding Information: This project has been partially funded by the German Research Foundation (DFG) within the project OASiS, project number 455913891, as part of the Priority Program “Robust Argumentation Machines (RATIO)” (SPP-1999). We would like to thank the participants of our study and the anonymous reviewers for the feedback and their time.; 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023.
S. Syed, T. Ziegenbein, P. Heinisch, H. Wachsmuth, and M. Potthast. Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, page 114--129. Prague, Czechia, Association for Computational Linguistics, (September 2023)
M. Stahl, and H. Wachsmuth. Proceedings of the 16th International Natural Language Generation Conference: Generation Challenges, page 31--36. (September 2023)
G. Skitalinskaya, and H. Wachsmuth. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), page 15799–15816. Association for Computational Linguistics (ACL), (July 2023)Funding Information: We thank Andreas Breiter for his valuable feedback on early drafts, and the anonymous reviewers for their helpful comments. This work was partially funded by the Deutsche Forschungsgemeinschaft(DFG, German Research Foundation) under project number 374666841, SFB 1342.; 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023.
G. Skitalinskaya, M. Spliethöver, and H. Wachsmuth. Proceedings of the 16th International Natural Language Generation Conference, page 134--152. (2023)DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions..
M. Sengupta. Findings of the Association for Computational Linguistics: EMNLP 2023, page 4636–4659. Association for Computational Linguistics (ACL), (December 2023)
Z. Nouri, N. Prakash, U. Gadiraju, and H. Wachsmuth. IUI 2023 - Proceedings of the 28th International Conference on Intelligent User Interfaces, page 737–749. United States, Association for Computing Machinery (ACM), (Mar 27, 2023)