copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Glyph Miner: A System for Efficiently Extracting Glyphs from Early Prints in the Context of OCR.

B. Budig, T. van Dijk, and F. Kirchner. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, page 31--34. ACM, (2016)

Abstract

While off-the-shelf OCR systems work well on many modern documents, the heterogeneity of early prints provides a significant challenge. To achieve good recognition quality, existing software must be "trained" specifically to each particular corpus. This is a tedious process that involves significant user effort. In this paper we demonstrate a system that generically replaces a common part of the training pipeline with a more efficient workflow: Given a set of scanned pages of a historical document, our system uses an efficient user interaction to semi-automatically extract large numbers of occurrences of glyphs indicated by the user. In a preliminary case study, we evaluate the effectiveness of our approach by embedding our system into the workflow at the University Library Würzburg.

Links and resources

BibTeX key: conf/jcdl/BudigDK16
entry type: inproceedings
booktitle: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
year: 2016
pages: 31--34
publisher: ACM
series: JCDL '16
crossref: conf/jcdl/2016
ee: http://doi.acm.org/10.1145/2910896.2910915
isbn: 978-1-4503-4229-2
Document: http://www1.pub.informatik.uni-wuerzburg.de/pub/budig/papers/JCDL-2016_Budig_vanDijk_Kirchner.pdf

@benedikt.budig's tags highlighted

Cite this publication

search on

Meta data

Last update 8 years ago
Created 8 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Glyph Miner: A System for Efficiently Extracting Glyphs from Early Prints in the Context of OCR.

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Glyph Miner: A System for Efficiently Extracting Glyphs from Early Prints in the Context of OCR.

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Glyph Miner: A System for Efficiently Extracting Glyphs from Early Prints in the Context of OCR.

Comments and Reviews
(0)