Inproceedings,

Indexing Historical Documents by Word Shape Signatures

, and .
Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 1, page 362-366. (September 2007)
DOI: 10.1109/ICDAR.2007.4378733

Abstract

In this paper a word spotting approach to index archival image documents is presented. Indices are constructed from keyword images. The spotting strategy is formulated on an indexing-by-shape basis. The well known shape context descriptor is used to compute word image signatures from the skeleton points. Afterwards, codewords are extracted from thresholded shape contexts. It is a simpler and more compact representation based on bit vectors. Document images are roughly segmented into words and a lookup table is constructed. Each word subimage is taken as a bin. Keyword images are spotted into documents by a voting strategy consisting in indexing into the lookup table by codewords, and voting into the corresponding bins. The approach is illustrated by a real application scenario consisting of documents from a digital archive of the Spanish Civil War.

Tags

Users

  • @petarkonig

Comments and Reviews