Gaussian Process-Based Bayesian Nonparametric Inference of Population
Trajectories from Gene Genealogies
J. Palacios, and V. Minin. (2011)cite arxiv:1112.4138Comment: 25 pages with 7 figures, revised version; added total variation metric to compare methods; influenza example updated with new GP results, added discussion and implementation of alternative GP kernels (OU and sparse approximation of the integrated Brownian motion).
Abstract
Changes in population size influence genetic diversity of the population and,
as a result, leave a signature of these changes in individual genomes in the
population. We are interested in the inverse problem of reconstructing past
population dynamics from genomic data. We start with a standard framework based
on the coalescent, a stochastic process that generates genealogies connecting
randomly sampled individuals from the population of interest. These genealogies
serve as a glue between the population demographic history and genomic
sequences. It turns out that only the times of genealogical lineage
coalescences contain information about population size dynamics. Viewing these
coalescent times as a point process, estimating population size trajectories is
equivalent to estimating a conditional intensity of this point process.
Therefore, our inverse problem is similar to estimating an inhomogeneous
Poisson process intensity function. We demonstrate how recent advances in
Gaussian process-based nonparametric inference for Poisson processes can be
extended to Bayesian nonparametric estimation of population size dynamics under
the coalescent. We compare our Gaussian process (GP) approach to one of the
state of the art Gaussian Markov random field (GMRF) methods for estimating
population trajectories. Using simulated data, we demonstrate that our method
has better accuracy and precision. Next, we analyze two genealogies
reconstructed from real sequences of hepatitis C and human Influenza A viruses.
In both cases, we recover more believed aspects of the viral demographic
histories than the GMRF approach. We also find that our GP method produces more
reasonable uncertainty estimates than the GMRF method.
Description
[1112.4138] Gaussian Process-Based Bayesian Nonparametric Inference of Population Trajectories from Gene Genealogies
cite arxiv:1112.4138Comment: 25 pages with 7 figures, revised version; added total variation metric to compare methods; influenza example updated with new GP results, added discussion and implementation of alternative GP kernels (OU and sparse approximation of the integrated Brownian motion)
%0 Generic
%1 palacios2011gaussian
%A Palacios, Julia A.
%A Minin, Vladimir N.
%D 2011
%K Gaussian_processes effective_population_size statistics
%T Gaussian Process-Based Bayesian Nonparametric Inference of Population
Trajectories from Gene Genealogies
%U http://arxiv.org/abs/1112.4138
%X Changes in population size influence genetic diversity of the population and,
as a result, leave a signature of these changes in individual genomes in the
population. We are interested in the inverse problem of reconstructing past
population dynamics from genomic data. We start with a standard framework based
on the coalescent, a stochastic process that generates genealogies connecting
randomly sampled individuals from the population of interest. These genealogies
serve as a glue between the population demographic history and genomic
sequences. It turns out that only the times of genealogical lineage
coalescences contain information about population size dynamics. Viewing these
coalescent times as a point process, estimating population size trajectories is
equivalent to estimating a conditional intensity of this point process.
Therefore, our inverse problem is similar to estimating an inhomogeneous
Poisson process intensity function. We demonstrate how recent advances in
Gaussian process-based nonparametric inference for Poisson processes can be
extended to Bayesian nonparametric estimation of population size dynamics under
the coalescent. We compare our Gaussian process (GP) approach to one of the
state of the art Gaussian Markov random field (GMRF) methods for estimating
population trajectories. Using simulated data, we demonstrate that our method
has better accuracy and precision. Next, we analyze two genealogies
reconstructed from real sequences of hepatitis C and human Influenza A viruses.
In both cases, we recover more believed aspects of the viral demographic
histories than the GMRF approach. We also find that our GP method produces more
reasonable uncertainty estimates than the GMRF method.
@misc{palacios2011gaussian,
abstract = {Changes in population size influence genetic diversity of the population and,
as a result, leave a signature of these changes in individual genomes in the
population. We are interested in the inverse problem of reconstructing past
population dynamics from genomic data. We start with a standard framework based
on the coalescent, a stochastic process that generates genealogies connecting
randomly sampled individuals from the population of interest. These genealogies
serve as a glue between the population demographic history and genomic
sequences. It turns out that only the times of genealogical lineage
coalescences contain information about population size dynamics. Viewing these
coalescent times as a point process, estimating population size trajectories is
equivalent to estimating a conditional intensity of this point process.
Therefore, our inverse problem is similar to estimating an inhomogeneous
Poisson process intensity function. We demonstrate how recent advances in
Gaussian process-based nonparametric inference for Poisson processes can be
extended to Bayesian nonparametric estimation of population size dynamics under
the coalescent. We compare our Gaussian process (GP) approach to one of the
state of the art Gaussian Markov random field (GMRF) methods for estimating
population trajectories. Using simulated data, we demonstrate that our method
has better accuracy and precision. Next, we analyze two genealogies
reconstructed from real sequences of hepatitis C and human Influenza A viruses.
In both cases, we recover more believed aspects of the viral demographic
histories than the GMRF approach. We also find that our GP method produces more
reasonable uncertainty estimates than the GMRF method.},
added-at = {2012-10-23T19:33:39.000+0200},
author = {Palacios, Julia A. and Minin, Vladimir N.},
biburl = {https://www.bibsonomy.org/bibtex/226278f54e28d37bb5277328db02f653d/peter.ralph},
description = {[1112.4138] Gaussian Process-Based Bayesian Nonparametric Inference of Population Trajectories from Gene Genealogies},
interhash = {dcb2afb99dc122553bebcf1e9598ac23},
intrahash = {26278f54e28d37bb5277328db02f653d},
keywords = {Gaussian_processes effective_population_size statistics},
note = {cite arxiv:1112.4138Comment: 25 pages with 7 figures, revised version; added total variation metric to compare methods; influenza example updated with new GP results, added discussion and implementation of alternative GP kernels (OU and sparse approximation of the integrated Brownian motion)},
timestamp = {2012-10-23T19:33:39.000+0200},
title = {Gaussian Process-Based Bayesian Nonparametric Inference of Population
Trajectories from Gene Genealogies},
url = {http://arxiv.org/abs/1112.4138},
year = 2011
}