Inproceedings,

Conditional Random Fields for Local Adaptive Reference Extraction

M. Toepfer, P. Kluegl, A. Hotho, and F. Puppe.
Workshop on Knowledge Discovery, Data Mining, Machine Learning (KDML 2010), (2010)

Full text

Abstract

The accurate extraction of bibliographic information from scientific publications is an active field of research. Machine learning, especially sequence labeling approaches like Conditional Random Fields (CRF), are often applied for this reference extraction task, but still suffer from the ambiguity of reference notation. Reference sections apply a predefined style guide and contain only homogeneous references. Therefore, other references of the same paper or journal often can provide evidence how the fields of a reference are correctly labeled. We propose a novel approach that exploits the similarities within a document. Our process model uses information of unlabeled documents directly during the extraction task in order to automatically adapt to the perceived style guide. This is implemented by changing the manifestation of the features for the applied CRF. The experimental results show considerable improvements compared to the common approach. We achieve an average F1 score of 96.7% and an instance accuracy of 85.4% on the test data set.

BibTeX key: 2010-LWA-TKHP
entry type: inproceedings
booktitle: Workshop on Knowledge Discovery, Data Mining, Machine Learning (KDML 2010)
year: 2010
Document: http://ki.informatik.uni-wuerzburg.de/papers/pkluegl/2010-LWA-CRFLAER.pdf

BibSonomy

Conditional Random Fields for Local Adaptive Reference Extraction

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on