Abstract

There is uncertainty about the true nature of predicted single-nucleotide polymorphisms (SNPs) in segmental duplications (duplicons) and whether these markers genuinely exist at increased density as indicated in public databases. We explored these issues by genotyping 157 predicted SNPs in duplicons and control regions in normal diploid genomes and fully homozygous complete hydatidiform moles. Our data identified many true SNPs in duplicon regions and few paralogous sequence variants. Twenty-eight percent of the polymorphic duplicon sequences we tested involved multisite variation, a new type of polymorphism representing the sum of the signals from many individual duplicon copies that vary in sequence content due to duplication, deletion or gene conversion. Multisite variations can masquerade as normal SNPs when genotyped. Given that duplicons comprise at least 5\% of the genome and many are yet to be annotated in the genome draft, effective strategies to identify multisite variation must be established and deployed. In conclusion, our study identifies MSVs as a new form of genome polymorphism. Careful laboratory practice should often recognize MSVs as aberrant markers, and MSVs may underlie the considerable fraction of markers that fail HWE. But some MSVs are probably being interpreted and used as unique SNPs, and HWE will not always identify these, even if large sample numbers are used. More generally, MSVs (or rather duplicon copy-number variation and duplicon gene conversion processes) might underlie some common phenotypic differences between individuals. We therefore suggest that MSVs should be specifically targeted for evaluation in disease and pharmacogenomics research.

Links and resources

Tags