@ytyoun

Space-Efficient and Exact de Bruijn Graph Representation Based on a Bloom Filter

, and . Algorithms in Bioinformatics, volume 7534 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2012)
DOI: 10.1007/978-3-642-33122-0_19

Abstract

The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (≥ 30 GB). We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours.

Links and resources

Tags

community

  • @dblp
  • @ytyoun
@ytyoun's tags highlighted