When Printed Hypertexts Go Digital: Information Extraction from the Parsing of Indices
M. Romanello, M. Berti, A. Babeu, and G. Crane. HT '09: Proceedings of the Twentieth ACM Conference on Hypertext and Hypermedia, New York, NY, USA, ACM, (July 2009)
Abstract
Modern critical editions of ancient works generally include
manually created indices of other sources quoted in the text.
Since indices can be considered as a form of domain specic
language, the paper presents a parsing-based approach to
the problem of extracting information from them to support
the creation of a collection of fragmentary texts. This paper
rst considers the characteristics and structure of quotation
indices and their importance when dealing with fragmentary
texts. It then presents the results of applying a fuzzy parser
to the OCR transcription of an index of quotations to extract
information from potentially noisy input.
%0 Conference Paper
%1 romanello2009printed
%A Romanello, Matteo
%A Berti, Monica
%A Babeu, Alison
%A Crane, Gregory
%B HT '09: Proceedings of the Twentieth ACM Conference on Hypertext and Hypermedia
%C New York, NY, USA
%D 2009
%I ACM
%K extraction ht2009 hypertexts indices information parsing poster pp159 printed
%T When Printed Hypertexts Go Digital: Information Extraction from the Parsing of Indices
%X Modern critical editions of ancient works generally include
manually created indices of other sources quoted in the text.
Since indices can be considered as a form of domain specic
language, the paper presents a parsing-based approach to
the problem of extracting information from them to support
the creation of a collection of fragmentary texts. This paper
rst considers the characteristics and structure of quotation
indices and their importance when dealing with fragmentary
texts. It then presents the results of applying a fuzzy parser
to the OCR transcription of an index of quotations to extract
information from potentially noisy input.
@inproceedings{romanello2009printed,
abstract = {Modern critical editions of ancient works generally include
manually created indices of other sources quoted in the text.
Since indices can be considered as a form of domain specic
language, the paper presents a parsing-based approach to
the problem of extracting information from them to support
the creation of a collection of fragmentary texts. This paper
rst considers the characteristics and structure of quotation
indices and their importance when dealing with fragmentary
texts. It then presents the results of applying a fuzzy parser
to the OCR transcription of an index of quotations to extract
information from potentially noisy input.},
added-at = {2009-06-16T15:00:02.000+0200},
address = {New York, NY, USA},
author = {Romanello, Matteo and Berti, Monica and Babeu, Alison and Crane, Gregory},
biburl = {https://www.bibsonomy.org/bibtex/278dc574ec491997f63bafda3f37b8006/ht09},
booktitle = {HT '09: Proceedings of the Twentieth ACM Conference on Hypertext and Hypermedia},
interhash = {d69e2ace20fdb0ef9e43dbf6b16a5f33},
intrahash = {78dc574ec491997f63bafda3f37b8006},
keywords = {extraction ht2009 hypertexts indices information parsing poster pp159 printed},
month = {July},
paperid = {pp159},
publisher = {ACM},
session = {Poster},
timestamp = {2009-06-16T15:00:07.000+0200},
title = {When Printed Hypertexts Go Digital: Information Extraction from the Parsing of Indices},
year = 2009
}