Abstract
Changes in population size influence genetic diversity of the population and,
as a result, leave a signature of these changes in individual genomes in the
population. We are interested in the inverse problem of reconstructing past
population dynamics from genomic data. We start with a standard framework based
on the coalescent, a stochastic process that generates genealogies connecting
randomly sampled individuals from the population of interest. These genealogies
serve as a glue between the population demographic history and genomic
sequences. It turns out that only the times of genealogical lineage
coalescences contain information about population size dynamics. Viewing these
coalescent times as a point process, estimating population size trajectories is
equivalent to estimating a conditional intensity of this point process.
Therefore, our inverse problem is similar to estimating an inhomogeneous
Poisson process intensity function. We demonstrate how recent advances in
Gaussian process-based nonparametric inference for Poisson processes can be
extended to Bayesian nonparametric estimation of population size dynamics under
the coalescent. We compare our Gaussian process (GP) approach to one of the
state of the art Gaussian Markov random field (GMRF) methods for estimating
population trajectories. Using simulated data, we demonstrate that our method
has better accuracy and precision. Next, we analyze two genealogies
reconstructed from real sequences of hepatitis C and human Influenza A viruses.
In both cases, we recover more believed aspects of the viral demographic
histories than the GMRF approach. We also find that our GP method produces more
reasonable uncertainty estimates than the GMRF method.
Users
Please
log in to take part in the discussion (add own reviews or comments).