Abstract
About 2\% of human genetic polymorphisms have been hypothesized to arise via
multinucleotide mutations (MNMs), complex events that generate SNPs at multiple
sites in a single generation. MNMs have the potential to accelerate the pace at
which single genes evolve and to confound studies of demography and selection
that assume all SNPs arise independently. In this paper, we examine clustered
mutations that are segregating in a set of 1,092 human genomes, demonstrating
that MNMs become enriched as large numbers of individuals are sampled. We
leverage the size of the dataset to deduce new information about the allelic
spectrum of MNMs, estimating the percentage of linked SNP pairs that were
generated by simultaneous mutation as a function of the distance between the
affected sites and showing that MNMs exhibit a high percentage of transversions
relative to transitions. These findings are reproducible in data from multiple
sequencing platforms. Among tandem mutations that occur simultaneously at
adjacent sites, we find an especially skewed distribution of ancestral and
derived dinucleotides, with $GCAA$, $GA\to
TT$ and their reverse complements making up 36% of the total. These
same mutations dominate the spectrum of tandem mutations produced by the
upregulation of low-fidelity Polymerase $\zeta$ in mutator strains of S.
cerevisiae that have impaired DNA excision repair machinery. This suggests that
low-fidelity DNA replication by Pol $\zeta$ is at least partly responsible for
the MNMs that are segregating in the human population, and that useful
information about the biochemistry of MNM can be extracted from ordinary
population genomic data. We incorporate our findings into a mathematical model
of the multinucleotide mutation process that can be used to correct
phylogenetic and population genetic methods for the presence of MNMs.
Users
Please
log in to take part in the discussion (add own reviews or comments).