Abstract
In this paper a word spotting approach to index archival image documents is presented. Indices are constructed from keyword images. The spotting strategy is formulated on an indexing-by-shape basis. The well known shape context descriptor is used to compute word image signatures from the skeleton points. Afterwards, codewords are extracted from thresholded shape contexts. It is a simpler and more compact representation based on bit vectors. Document images are roughly segmented into words and a lookup table is constructed. Each word subimage is taken as a bin. Keyword images are spotted into documents by a voting strategy consisting in indexing into the lookup table by codewords, and voting into the corresponding bins. The approach is illustrated by a real application scenario consisting of documents from a digital archive of the Spanish Civil War.
Users
Please
log in to take part in the discussion (add own reviews or comments).