@peter.ralph

A scalable estimator of SNP heritability for biobank-scale data

, and . Bioinformatics (Oxford, England), 34 (13): i187--i194 (July 2018)
DOI: 10.1093/bioinformatics/bty253

Abstract

MOTIVATION: Heritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens. RESULTS: We propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity O(NMB) for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to O(NMBmax( log⁡3N, log⁡3M)). We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min. AVAILABILITY AND IMPLEMENTATION: The RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg.

Links and resources

Tags

community

  • @peter.ralph
  • @dblp
@peter.ralph's tags highlighted