Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. For more information about Tika, please see the list of supported document formats and the available documentation . You can find the latest release on the download page . See the Getting Started guide for instructions on how to start using Tika.
Tika is a subproject of Apache Lucene . Lucene is a project of the Apache Software Foundation .
reStructuredText is an easy-to-read, what-you-see-is-what-you-get plaintext markup syntax and parser system. It is useful for in-line program documentation (such as Python docstrings), for quickly creating simple web pages, and for standalone documents. reStructuredText is designed for extensibility for specific application domains. The reStructuredText parser is a component of Docutils. reStructuredText is a revision and reinterpretation of the StructuredText and Setext lightweight markup systems.