@peter.ralph

Fast and scalable genome-wide inference of local tree topologies from large number of haplotypes based on tree consistent PBWT data structure

, , and . bioRxiv, (2019)
DOI: 10.1101/542035

Abstract

Estimation of the relationship between DNA sequences is one of the most important problems in genomics. Understanding these relationships is central to demographic inference, correction of population structure in GWAS, identifying signals of selection etc. The data structure containing the full information about sample genealogy is called the ancestral recombination graph (ARG). However, ARG inference is a very difficult problem, not least due to a very complex state space. In this work we describe a new approach for fast and scalable generation of local tree topologies relating large numbers of haplotypes. Our method is closely related to the estimation of ARG, and captures both local and global properties of an ARG. It is based on a data structure which we call tree consistent PBWT, a modification of PBWT data structure introduced by R. Durbin (2014). We also explore some methods to estimate the quality of the generated tree topologies and to make inferences based on them. At the end we discuss a probabilistic model which could potentially lead to the estimation of ARG node times.

Links and resources

Tags