Evaluation of the performance of several RDF stores when small pieces of information are requested from a large dataset (DBpedia infoboxes plus two very small sets). The benchmark queries employ varying levels of joins and constraints
This paper describes the use of bitmap indices for optimizing Storage of RDF triples. Implementation techniques are described and results analyzed for different size data sets, up to a billion triples.
We show how we can load the 1 billion triple LUBM benchmark set with a sustained rate of 12692 triples/s and the 47M triple Wikipedia data set at a rate of 20800 triples/s.
The Lehigh University Benchmark is developed to facilitate the evaluation of Semantic Web repositories in a standard and systematic way. The benchmark is intended to evaluate the performance of those repositories with respect to extensional queries over a large data set that commits to a single realistic ontology. It consists of a university domain ontology, customizable and repeatable synthetic data, a set of test queries, and several performance metrics.
Kowari is an open-sourced (Mozilla Public License) triplestore optimized for RDF storage, created by Tucana Technologies, and written entirely in Java 1.4.2. It began its life as the storage component of the Tucana Knowledge Server (TKS), Tucana's proprietary knowledge management suite, and remains under active development by Tucana.
The aim of the Ingenta MetaStore project is to build a flexible and scalable repository for the storage of bibliographic metadata spanning 17 million articles and 20,000 publications.
The repository replaces several existing data stores and will act as a focal point for integration of a number of existing applications and future projects. Scalability, replication and robustness were important considerations in the repository design.
After introducing the benefits of using RDF as the data model for this repository, the paper will focus on the practical challenges involved in creating and managing a very large triple store.
The repository currently contains over 200 million triples from a range of vocabularies including FOAF, Dublin Core and PRISM.
The challenges faced range from schema design, data loading, SPARQL query performance. Load testing of the repository provided some insights into the tuning of SPARQL queries.
The paper will introduce the solutions developed to meet these challenges with the goal of helping others seeking to deploy a large triple store in a production environment. The paper will also suggest some avenues for further research and development.