Article,

Accurate estimation of substitution rates with neighbor-dependent models in a phylogenetic context

, and .
Syst Biol, 61 (3): 510-521 (May 2012)
DOI: 10.1093/sysbio/sys024

Abstract

Most models and algorithms developed to perform statistical inference from DNA data make the assumption that substitution processes affecting distinct nucleotide sites are stochastically independent. This assumption ensures both mathematical and computational tractability but is in disagreement with observed data in many situations--one well-known example being CpG dinucleotide hypermutability in mammalian genomes. In this paper, we consider the class of RN95 + YpR substitution models, which allows neighbor-dependent effects--including CpG hypermutability--to be taken into account, through transitions between pyrimidine-purine dinucleotides. We show that it is possible to adapt inference methods originally developed under the assumption of independence between sites to RN95 + YpR models, using a mathematically rigorous framework provided by specific structural properties of this class of models. We assess how efficient this approach is at inferring the CpG hypermutability rate from aligned DNA sequences. The method is tested on simulated data and compared against several alternatives; the results suggest that it delivers a high degree of accuracy at a low computational cost. We then apply our method to an alignment of 10 DNA sequences from primate species. Model comparisons within the RN95 + YpR class show the importance of taking into account neighbor-dependent effects. An application of the method to the detection of hypomethylated islands is discussed.

Tags

Users

  • @peter.ralph

Comments and Reviews