- DL Consulting Blog
- OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural l...OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. This server allows you to use the system through your web browser.
- Schema for representing OCR results exported from FineReader 8.0 SDK. Copyright 2001-2006 ABBYY, Inc.
- Schema for representing OCR results exported from FineReader 6.0. Copyright 2001-2002 ABBYY, Inc.
- hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this in...hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this information invisibly in standard HTML. By building on standard HTML, it automatically inherits well-defined support for most scripts, languages, and common layout options. Furthermore, unlike previous OCR formats, the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation.
- The purpose of this document is to define an open standard for representing OCR results. The goal is to reuse as much existing technology as possible, and ...The purpose of this document is to define an open standard for representing OCR results. The goal is to reuse as much existing technology as possible, and to arrive at a representation that makes it easy to reuse OCR results.
- OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natur...OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.
- With optical character recognition (OCR), you can scan the contents of a document into a single file of editable text. This article, which focuses on scann...With optical character recognition (OCR), you can scan the contents of a document into a single file of editable text. This article, which focuses on scanning books, describes the steps you need to take to prepare pages for optimal OCR results, and compares various free OCR tools to determine which is the best at extracting the text.
- Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extra...Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.
- The National Library of Australia, in collaboration with the Australian State and Territory libraries, are creating a free online service that gives full-t...The National Library of Australia, in collaboration with the Australian State and Territory libraries, are creating a free online service that gives full-text searching of newspaper articles. This will include newspapers published in each state and territory from the 1800s to the mid-1950s, when copyright applies. The first Australian newspaper, published in Sydney in 1803, is included in the program. The Beta service contains 70,000 newspaper pages from 1803 onwards and additional pages are being added each week.
- METS / ALTOgeneral information
- Digital Library Consulting Blog


user