builds on the well-known Lucene search engine library to create an enterprise search server with a simple HTTP/XML interface. Using Solr, large collections of documents can be indexed
In Bibliothekskatalogen kommt der 'Treffersortierung nach Relevanz' immer größere Bedeutung zu. Der Aufsatz beschreibt verschiedene Möglichkeiten zur Optimierung des Trefferrankings am Beispiel des Lucene-basierten OPACs der UB Heidelberg. Zur Bestimmung der Relevanz können die Inhalte einzelner Datenfelder analysiert und gewichtet, es können Kriterien der Popularität, der Verfügbarkeit oder der Bewertung eines Titels, oder auch Nutzerprofile berücksichtigt werden. Im Beitrag werden verschiedene Gewichtungsmöglichkeiten und Lösungsansätze für weitere Kriterien aufgezeigt.
Imagine you can see 160 years of history, all on one screen. You can zoom and pan, you can look at a particular day, you can even do a search. And when you do, the results come up not as a list, but as a heat map that shows where in history that topic appears, and how often.
Katta is a scalable, failure tolerant, distributed, data storage for real time access.
Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.
* Makes serving large or high load indices easy
* Serves very large Lucene or Hadoop Mapfile indices as index shards on many servers
* Replicate shards on different servers for performance and fault-tolerance
* Supports pluggable network topologies
* Master fail-over
* Fast, lightweight, easy to integrate
* Plays well with Hadoop clusters
* Apache Version 2 License
Welcome to TuQS! Turnguard's QuadStore is the first draft of an own implementation of a QuadStore with main focus on data-retrieval speed. TuQS can be queried and updated using openrdf's SAIL API. Please choose a repository here. * Features o SAIL accessible o True QuadStore with GraphSupport o HighSpeed regex SPARQL filters o Userrights on TripleBasis o Extendable to a QuintStore (or more generally to an n-Store) o Cachable SPARQL Queries for further speed improvement o Clusterable o Federationable o FullTextSearchable
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika.
Katta is a scalable, failure tolerant, distributed, data storage for real time access.
Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.
* Makes serving large or high load indices easy
* Serves very large Lucene or Hadoop Mapfile indices as index shards on many servers
* Replicate shards on different servers for performance and fault-tolerance
* Supports pluggable network topologies
* Master fail-over
* Fast, lightweight, easy to integrate
* Plays well with Hadoop clusters
* Apache Version 2 License
N. Ferro, и D. Harman. Multilingual Information Access Evaluation I. Text Retrieval Experiments, том 6241 из Lecture Notes in Computer Science, Springer, Berlin / Heidelberg, (2010)
D. Hiemstra, и C. Hauff. Multilingual and Multimodal Information Access Evaluation, том 6360 из Lecture Notes in Computer Science, стр. 64--69. Berlin, Springer Verlag, (2010)
U. Schindler, и I. Drost. Java Magazin, (2010)Zusätzlich interessante Punkte die im Artikel erwähnt werden:
1) Die Häufigkeit einzelner Suchanfragen ist meist zipf-verteilt.
2) Abstandsberechnung bei Geodaten über Haversinus.
3) Cartesian Tiers
4) Wissenschaftliches Infosystem PANGAEA
5) KML Regionen Dokumentation von Google
6) Geohshes.