hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this information invisibly in standard HTML. By building on standard HTML, it automatically inherits well-defined support for most scripts, languages, and common layout options. Furthermore, unlike previous OCR formats, the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation.
OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. This server allows you to use the system through your web browser.
XOXO (eXtensible Open XHTML Outlines) is an XML microformat for outlines built on top of XHTML. Developed by several authors as an attempt to reuse XHTML building blocks instead of inventing unnecessary new XML elements/attributes, XOXO is based on existing conventions for publishing outlines, lists, and blogrolls on the Web. The XOXO specification defines an outline as a hierarchical, ordered list of arbitrary elements. The specification is fairly open which makes it suitable for many types of list data. E.g. the more semantic version of the S5 presentation file format is based upon XOXO.
Microformats are small and gentle syntactic touchups for your web pages.They have one major purpose: to make your data readable by both man and machine...The machine-readable-data (and thus the microformat) concept is not new; it has a very recent fo
microformats are, just as importantly, defined by what they are not: not a new language; not infinitely extensible and open-ended; not an attempt to get everyone to change their behavior and rewrite their tools; not a whole new approach that throw
some time now, I’ve wanted to increase my understanding of microformats. If you’re unfamilar with the term or want to understand the basic purpose of this technology better, I suggest reading Phil Windley’s Microformats: Paving the Cowpaths. I read
hResume is a microformat for publishing résumé or Curriculum Vitae (CV) information [1] using (X)HTML on web pages. Like many other microformats, hResume uses CSS class names to make an otherwise non-semantic XHTML document more meaningful. A document containing resume information could be improved to use hResume without altering the appearance to the browser, making it easy to adopt.
Microformats are small and gentle syntactic touchups for your web pages.They have one major purpose: to make your data readable by both man and machine...The machine-readable-data (and thus the microformat) concept is not new; it has a very recent fo
So, what are you waiting for?The network effect tells us that the value of a technology increases the more it is used. Microformats are rapidly experiencing the benefits of this effect. Innovative publishers are publishing microformats, while innovative
Technorati has been a strong supporter of open microformats standards for quite some time. We launched the first implementation that indexes and searches posts tagged with rel-tag. We support XOXO for lists and outlines throughout the site, our member's f