@ht09

When Printed Hypertexts Go Digital: Information Extraction from the Parsing of Indices

, , , und . HT '09: Proceedings of the Twentieth ACM Conference on Hypertext and Hypermedia, New York, NY, USA, ACM, (Juli 2009)

Zusammenfassung

Modern critical editions of ancient works generally include manually created indices of other sources quoted in the text. Since indices can be considered as a form of domain specic language, the paper presents a parsing-based approach to the problem of extracting information from them to support the creation of a collection of fragmentary texts. This paper rst considers the characteristics and structure of quotation indices and their importance when dealing with fragmentary texts. It then presents the results of applying a fuzzy parser to the OCR transcription of an index of quotations to extract information from potentially noisy input.

Links und Ressourcen

Tags

Community

  • @ht09
  • @dblp
@ht09s Tags hervorgehoben