Article,

Are NoSQL Data Stores Useful for Bioinformatics Researchers

B. Shao, and T. Conrad.
International Journal on Recent and Innovation Trends in Computing and Communication, 3 (3): 1704--1708 (March 2015)
DOI: 10.17762/ijritcc2321-8169.1503176

Abstract

The big data challenge in bioinformatics is approaching. Data storage and processing, instead of experimental technologies, are becoming the slower and more costly part of research. Biological data typically have large size and a variety of structures. The ability to efficiently store and retrieve the data is important in bioinformatics research. Traditionally, large datasets are either stored as disk-based flat-files or in relational databases. These systems become more complicated to plan, maintain and adjust to big data applications as they follow rigid table schema and often lack scalability, e.g. for data aggregation. Meanwhile, non-relational databases (NoSQL) emerge to provide alternative, flexible and more scalable data stores. In this study, we aim to quantitatively compare the latencies of different data stores on storing and querying proteomics datasets. We show benchmarks for typical relational and non-relational systems for both, in-memory and disk-based configurations and compare them to a simple flat-file based approach. We will focus on the latencies of storing and querying proteomics mass spectrometry datasets and the actual space consumption inside the data stores. Experiments are carried out on a local desktop with medium-sized data, which is the typical experimental settings of individual bioinformatics researchers. Results show that there are significant latency differences among the considered data stores (up to 30 folds). In certain use cases, flat file system can achieve comparable performance with the data stores.

BibTeX key: Shao_2015
entry type: article
year: 2015
month: march
journal: International Journal on Recent and Innovation Trends in Computing and Communication
number: 3
pages: 1704--1708
publisher: Auricle Technologies, Pvt., Ltd.
volume: 3
DOI: 10.17762/ijritcc2321-8169.1503176
url: http://dx.doi.org/10.17762/ijritcc2321-8169.1503176

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 Shao_2015 %A Shao, Borong %A Conrad, Tim OF %D 2015 %I Auricle Technologies, Pvt., Ltd. %J International Journal on Recent and Innovation Trends in Computing and Communication %K and data databases latencies non-relational proteomics querying relational storing vs. %N 3 %P 1704--1708 %R 10.17762/ijritcc2321-8169.1503176 %T Are NoSQL Data Stores Useful for Bioinformatics Researchers %U http://dx.doi.org/10.17762/ijritcc2321-8169.1503176 %V 3 %X The big data challenge in bioinformatics is approaching. Data storage and processing, instead of experimental technologies, are becoming the slower and more costly part of research. Biological data typically have large size and a variety of structures. The ability to efficiently store and retrieve the data is important in bioinformatics research. Traditionally, large datasets are either stored as disk-based flat-files or in relational databases. These systems become more complicated to plan, maintain and adjust to big data applications as they follow rigid table schema and often lack scalability, e.g. for data aggregation. Meanwhile, non-relational databases (NoSQL) emerge to provide alternative, flexible and more scalable data stores. In this study, we aim to quantitatively compare the latencies of different data stores on storing and querying proteomics datasets. We show benchmarks for typical relational and non-relational systems for both, in-memory and disk-based configurations and compare them to a simple flat-file based approach. We will focus on the latencies of storing and querying proteomics mass spectrometry datasets and the actual space consumption inside the data stores. Experiments are carried out on a local desktop with medium-sized data, which is the typical experimental settings of individual bioinformatics researchers. Results show that there are significant latency differences among the considered data stores (up to 30 folds). In certain use cases, flat file system can achieve comparable performance with the data stores.

@article{Shao_2015, abstract = {The big data challenge in bioinformatics is approaching. Data storage and processing, instead of experimental technologies, are becoming the slower and more costly part of research. Biological data typically have large size and a variety of structures. The ability to efficiently store and retrieve the data is important in bioinformatics research. Traditionally, large datasets are either stored as disk-based flat-files or in relational databases. These systems become more complicated to plan, maintain and adjust to big data applications as they follow rigid table schema and often lack scalability, e.g. for data aggregation. Meanwhile, non-relational databases (NoSQL) emerge to provide alternative, flexible and more scalable data stores. In this study, we aim to quantitatively compare the latencies of different data stores on storing and querying proteomics datasets. We show benchmarks for typical relational and non-relational systems for both, in-memory and disk-based configurations and compare them to a simple flat-file based approach. We will focus on the latencies of storing and querying proteomics mass spectrometry datasets and the actual space consumption inside the data stores. Experiments are carried out on a local desktop with medium-sized data, which is the typical experimental settings of individual bioinformatics researchers. Results show that there are significant latency differences among the considered data stores (up to 30 folds). In certain use cases, flat file system can achieve comparable performance with the data stores.}, added-at = {2015-08-13T08:45:49.000+0200}, author = {Shao, Borong and Conrad, Tim OF}, biburl = {https://www.bibsonomy.org/bibtex/2bda7bf6e73f4645dc30b94070930d681/ijritcc}, doi = {10.17762/ijritcc2321-8169.1503176}, interhash = {0a5d6029607d3ff93a80128611cbf36f}, intrahash = {bda7bf6e73f4645dc30b94070930d681}, journal = {International Journal on Recent and Innovation Trends in Computing and Communication}, keywords = {and data databases latencies non-relational proteomics querying relational storing vs.}, month = {march}, number = 3, pages = {1704--1708}, publisher = {Auricle Technologies, Pvt., Ltd.}, timestamp = {2015-08-13T08:45:49.000+0200}, title = {Are {NoSQL} Data Stores Useful for Bioinformatics Researchers}, url = {http://dx.doi.org/10.17762/ijritcc2321-8169.1503176}, volume = 3, year = 2015 }

BibSonomy

Are NoSQL Data Stores Useful for Bioinformatics Researchers

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on