Imagine you can see 160 years of history, all on one screen. You can zoom and pan, you can look at a particular day, you can even do a search. And when you do, the results come up not as a list, but as a heat map that shows where in history that topic appears, and how often.
Katta is a scalable, failure tolerant, distributed, data storage for real time access.
Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.
* Makes serving large or high load indices easy
* Serves very large Lucene or Hadoop Mapfile indices as index shards on many servers
* Replicate shards on different servers for performance and fault-tolerance
* Supports pluggable network topologies
* Master fail-over
* Fast, lightweight, easy to integrate
* Plays well with Hadoop clusters
* Apache Version 2 License
Welcome to TuQS! Turnguard's QuadStore is the first draft of an own implementation of a QuadStore with main focus on data-retrieval speed. TuQS can be queried and updated using openrdf's SAIL API. Please choose a repository here. * Features o SAIL accessible o True QuadStore with GraphSupport o HighSpeed regex SPARQL filters o Userrights on TripleBasis o Extendable to a QuintStore (or more generally to an n-Store) o Cachable SPARQL Queries for further speed improvement o Clusterable o Federationable o FullTextSearchable
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika.
N. Ferro, und D. Harman. Multilingual Information Access Evaluation I. Text Retrieval Experiments, Volume 6241 von Lecture Notes in Computer Science, Springer, Berlin / Heidelberg, (2010)
D. Hiemstra, und C. Hauff. Multilingual and Multimodal Information Access Evaluation, Volume 6360 von Lecture Notes in Computer Science, Seite 64--69. Berlin, Springer Verlag, (2010)
U. Schindler, und I. Drost. Java Magazin, (2010)Zusätzlich interessante Punkte die im Artikel erwähnt werden:
1) Die Häufigkeit einzelner Suchanfragen ist meist zipf-verteilt.
2) Abstandsberechnung bei Geodaten über Haversinus.
3) Cartesian Tiers
4) Wissenschaftliches Infosystem PANGAEA
5) KML Regionen Dokumentation von Google
6) Geohshes.