Abstract

We introduce a low dimensional function of the site frequency spectrum that is tailor-made for distinguishing coalescent models with multiple mergers from Kingman coalescent models with population growth, and use this function to construct a hypothesis test between these two model classes. The null and alternative sampling distributions of our statistic are intractable, but its low dimensionality renders these distributions amenable to Monte Carlo estimation. We construct kernel density estimates of the sampling distributions based on simulated data, and show that the resulting hypothesis test dramatically improves on the statistical power of a current state-of-the-art method. A key reason for this improvement is the use of multi-locus data, in particular averaging observed site frequency spectra across unlinked loci to reduce sampling variance. We also demonstrate the robustness of our method to nuisance and tuning parameters, and argue that it is readily generalisable for applications in hypothesis testing, parameter inference and experimental design.

Description

[1701.07787] Multi-locus data distinguishes between population growth and multiple merger coalescents

Links and resources

Tags