Announcing the OCRopus Open Source OCR System Apr 09, 2007 - Permalink Posted by Thomas Breuel, OCRopus Project Leader We're happy to announce the OCRopus OCR Project, a Google-sponsored project to develop advanced OCR technologies in the IUPR research g
hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this information invisibly in standard HTML. By building on standard HTML, it automatically inherits well-defined support for most scripts, languages, and common layout options. Furthermore, unlike previous OCR formats, the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation.
OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. This server allows you to use the system through your web browser.
Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.
ABBYY FineReader Engine for Embedded OS is a highly portable small-footprint OCR SDK with low resource requirements designed to integrate document conversion technologies into the MFPs and other imaging devices
SceneReader is a new software technology designed to locate, analyse and report alphabetic text in a broad variety of photographic images, including highly complex images such as street scenes.
A. Antonacopoulos, and D. Karatzas. 8th International Conference on Document Analysis and Recognition,
2005. ICDAR 2005, page 48--53. International Association of Pattern Recognition (IAPR), IEEE Computer Society, (2005)
H. Yang, B. Quehl, and H. Sack. Proc. of the 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012), April 11-13, 2012, Vienna (Austria), page 9--12. (2012)