Article,

Misclassification in binary responses and effect on genome-wide association studies

, , , and .
Poultry Science, 92 (9): 2535-2540 (2013)
DOI: 10.3382/ps.2012-02738

Abstract

Misclassification of dependent variables is a major issue in many areas of science that can arise when indirect markers are used to classify subjects or continuous traits are treated as categorical. In human medicine, this can have significant impacts on diagnostic accuracy. In animal science applications, misclassification can negatively affect both the accuracy of selection and the ability to ascertain the biological mechanisms for traits of interest. When dealing with traits influenced by genetic factors, genomic markers, such as SNP, can provide direct measurements of the underlying mechanisms controlling phenotypic responses. Unfortunately, in the presence of misclassification in the discrete dependent variables, the robustness of the analysis and the validity of the results could be severely compromised. To quantify the impact of misclassification on genome-wide association studies for binary responses, a real databased simulation was carried out. The simulated data consisted of 2,400 animals genotyped for 50K SNP. A binary trait with heritability equal to 0.10 and prevalence of 20% was generated. A rate of 1, 5, and 10% misclassification was artificially introduced to the true binary responses. Using a latent-threshold model, 3 analyses were carried out for each misclassification rate using 1) the true data (M1), 2) the contaminated data and ignoring misclassification (M2), and 3) the contaminated data and accounting for misclassification (M3). The results indicate that ignoring misclassification, when it exists in the data such as in M2, will lead to major deterioration in the performance of the model. When misclassification was contemplated in the model (M2), the results indicated a strong capacity of the procedure in dealing with potential misclassification in the training set. In fact, a large portion of miscoded samples in the training set was identified and corrected. The results of this study suggest that the proposed method is adequate and effective for practical genome-wide association studies for binary response classification.

Tags

Users

  • @uga.abgg

Comments and Reviews