OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. This server allows you to use the system through your web browser.
hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this information invisibly in standard HTML. By building on standard HTML, it automatically inherits well-defined support for most scripts, languages, and common layout options. Furthermore, unlike previous OCR formats, the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation.
hMedia is a microformat for identifying semantic information for content embedded in media-related websites and blogs. It is often desired to link or embed varying types of media into a web page and associate meta-data with the media. For example, when li
hAudio is a simple, open, distributed format, suitable for embedding information about audio recordings in (X)HTML, Atom, RSS, and arbitrary XML. hAudio is one of several microformats open standards. This document is a mapping of hAudio to RDFa.
Cognition is a parser for both “upper case Semantic Web” (RDF, RDFa) and “lower case semantic web” (microformats) technologies. It includes modules for exporting parsed data in a variety of formats, including RDF, vCard, iCalendar, Atom and KML.
Monkeyformats are Greasemonkey userscripts that add semantic markup in the Firefox browser to websites that do not yet support Microformats. You can then perform actions with these Microformats through Operator or some similar add-on.