Abstract
A wide range of studies in population genetics have employed the sample
frequency spectrum (SFS), a summary statistic which describes the distribution
of mutant alleles at a polymorphic site in a sample of DNA sequences. In
particular, recently there has been growing interest in analyzing the joint SFS
data from multiple populations to infer parameters of complex demographic
histories, including variable population sizes, population split times,
migration rates, admixture proportions, and so on. Although much methodological
progress has been made, existing SFS-based inference methods suffer from
numerical instability and high computational complexity when multiple
populations are involved and the sample size is large. In this paper, we
present new analytic formulas and algorithms that enable efficient computation
of the expected joint SFS for multiple populations related by a complex
demographic model with arbitrary population size histories (including piecewise
exponential growth). Our results are implemented in a new software package
called momi (MOran Models for Inference). Through an empirical study involving
tens of populations, we demonstrate our improvements to numerical stability and
computational complexity.
Users
Please
log in to take part in the discussion (add own reviews or comments).