OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.
With optical character recognition (OCR), you can scan the contents of a document into a single file of editable text. This article, which focuses on scanning books, describes the steps you need to take to prepare pages for optimal OCR results, and compares various free OCR tools to determine which is the best at extracting the text.
LOCKSS (Lots of Copies Keep Stuff Safe) is an international non-profit community initiative that provides tools and support so libraries can easily and cost-effectively preserve today’s web-published materials for tomorrow’s readers.
»Auf dieser Webseite finden Sie Informationen über das OpenSource Recherche-Portal OpenBib und seine Nebenprojekte OpenDIA und OLWS. Mit der Portalsoftware OpenBib ist es möglich effizient und schnell eine grosse Anzahl an Katalogen verschiedener Bibli
VuFind is a library resource portal designed and developed for libraries by libraries. The goal of VuFind is to enable your users to search and browse through all of your library's resources by replacing the traditional OPAC to include...
The Named Entity Tagger is a self-contained package which incorporates versions of SNoW and FEX, together with an inference module. It includes a network trained to recognize Person, Location, Organization and Misc. entities in English.
knowee is an open-source web contact organizer (or "online social graph manager" for better buzzword compliance). It's decentralized and lets you aggregate, track, organize, and share information about you and the people you know.
CUFTS is an open source (GPL) OpenURL link resolver designed for use by library consortia. It supports multiple sites from one server, online management tools, usage statistics, and supports a knowledgebase of over 350 resources with 422,000 title records
Raptor is a free software / Open Source C library that provides a set of parsers and serializers that generate Resource Description Framework (RDF) triples by parsing syntaxes or serialize the triples into a syntax