Article,

Neural ParsCit: a deep learning-based reference string parser

A. Prasad, M. Kaur, and M. Kan.
International Journal on Digital Libraries, 19 (4): 323--337 (Nov 1, 2018)
DOI: 10.1007/s00799-018-0242-1

Abstract

We present a deep learning approach for the core digital libraries task of parsing bibliographic reference strings. We deploy the state-of-the-art long short-term memory (LSTM) neural network architecture, a variant of a recurrent neural network to capture long-range dependencies in reference strings. We explore word embeddings and character-based word embeddings as an alternative to handcrafted features. We incrementally experiment with features, architectural configurations, and the diversity of the dataset. Our final model is an LSTM-based architecture, which layers a linear chain conditional random field (CRF) over the LSTM output. In extensive experiments in both English in-domain (computer science) and out-of-domain (humanities) test cases, as well as multilingual data, our results show a significant gain (\$\$p<0.01\$\$) over the reported state-of-the-art CRF-only-based parser.

BibTeX key: prasad2018neural
entry type: article
year: 2018
month: nov
day: 01
journal: International Journal on Digital Libraries
number: 4
pages: 323--337
volume: 19
issn: 1432-1300
DOI: 10.1007/s00799-018-0242-1
url: https://doi.org/10.1007/s00799-018-0242-1

BibSonomy

Neural ParsCit: a deep learning-based reference string parser

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on