@peter.ralph

Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data

, and . The American Journal of Human Genetics, 93 (5): 840 - 851 (2013)
DOI: http://dx.doi.org/10.1016/j.ajhg.2013.09.014

Abstract

Existing methods for identity by descent (IBD) segment detection were designed for \SNP\ array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for \IBD\ detection in \SNP\ array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting \IBD\ segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The \IBDseq\ method estimates probabilities of genotypes observed with error for each pair of individuals under \IBD\ and non-IBD models. The ratio of estimated probabilities under the two models gives a ŁOD\ score for IBD. We evaluate several \IBD\ detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that \IBDseq\ achieves high power and accuracy for \IBD\ detection in sequence data. The \SEQERR\ method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in \IBD\ segments. We demonstrate the accuracy of \SEQERR\ in simulated data, and we apply the method to estimate genotype error rates in sequence data from the \UK10K\ and 1000 Genomes projects.

Links and resources

Tags