@jaeschke

Evidence-based Information Extraction for High Accuracy Citation and Author Name Identification

, and . Large Scale Semantic Access to Content (Text, Image, Video, and Sound), page 618--632. Paris, France, France, LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE, (2007)

Abstract

Citations play an essential role in navigating academic literature and following chains of evidence in research. With the growing availability of large digital archives of scientific papers, the automated extraction and analysis of citations is becoming increasingly relevant. However, existing approaches to citation extraction still fall short of the high accuracy required to build more sophisticated and reliable tools for citation analysis and corpus navigation. In this paper, we present techniques for high accuracy extraction of citations and references from academic papers. By collecting multiple sources of evidence about entities from documents, and integrating citation extraction, reference segmentation, and citation-reference matching, we are able to significantly improve performance in subtasks including citation identification, author named entity recognition, and citation-reference matching. Applying our algorithm to previously-unseen documents, we demonstrate high F-measure performance of 0.980 for citation extraction, 0.983 for author named entity recognition, and 0.948 for citation-reference matching.

Description

Evidence-based information extraction for high accuracy citation and author name identification

Links and resources

Tags

community

  • @jaeschke
  • @dblp
  • @pkluegl
@jaeschke's tags highlighted