Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. For more information about Tika, please see the list of supported document formats and the available documentation . You can find the latest release on the download page . See the Getting Started guide for instructions on how to start using Tika.
Tika is a subproject of Apache Lucene . Lucene is a project of the Apache Software Foundation .
Step Towards Disease Outbreak Information Extraction: Automatic ...
http://naist.cpe.ku.ac.th/SlideSNLP2007/131207/A%20Step%20Towards%20Disease%20Outbreak%20Information%20Extraction%20Automatic%20Entity%20Role%20Recognition%20for%20Named%20Entities.pdf
A technique for studying disorder in quantum systems is able to spot significant patterns in large data sets such as web pages, and may be adaptable to
The cb2Bib is a tool for rapidly extracting unformatted, or unstandardized bibliographic references from email alerts, journal Web pages, and PDF files.
This is the project page for SecondString, an open-source Java-based package of approximate string-matching techniques. This code was developed by researchers at Carnegie Mellon University from the Center for Automated Learning and Discovery, the Department of Statistics, and the Center for Computer and Communications Security.
SecondString is intended primarily for researchers in information integration and other scientists. It does or will include a range of string-matching methods from a variety of communities, including statistics, artificial intelligence, information retrieval, and databases. It also includes tools for systematically evaluating performance on test data. It is not designed for use on very large data sets.
Y. Ohsawa, N. Benson, und M. Yachida. ADL '98: Proceedings of the Advances in Digital Libraries Conference, Seite 12. Washington, DC, USA, IEEE Computer Society, (1998)
Y. Matsuo, und M. Ishizuka. Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference, Seite 392-396. AAAI Press, (2003)
J. Chang, J. Boyd-Graber, und D. Blei. KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Seite 169--178. New York, NY, USA, ACM, (2009)
P. Kluegl, M. Atzmueller, und F. Puppe. Proceedings of the Biennial GSCL Conference 2009, 2nd UIMA@GSCL Workshop, Seite 233-240. Gunter Narr Verlag, (2009)
T. Rattenbury, N. Good, und M. Naaman. SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seite 103--110. New York, NY, USA, ACM Press, (2007)
X. Wan, und J. Xiao. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Seite 969--976. Manchester, UK, Coling 2008 Organizing Committee, (August 2008)