A comprehensive imputation-based evaluation of tag SNP selection strategies

Dat Thanh Nguyen, Hieu Quang Dinh, Giang Minh Vu, Duong Thuy Nguyen, Nam Sy Vo

Abstract

Regardless of the rapid development of sequencing technology, single nucleotide polymorphism (SNP) array has been widely used for many large-scale genomic studies due to its cost-effectiveness. Recently, in parallel with the advancement in imputation strategies, several genotyping platforms for various species have been developed. Despite the importance of imputation accuracy in SNP array design, to the best of our knowledge, there are no systematic studies for evaluating tag SNP selection methods based on this metric. In this paper, using the leave-one-out cross-validation approach on the 1000 genome high-coverage dataset, we comprehensively evaluated four well-known tag SNP selection algorithms based on imputation accuracy. Our results showed that although all widely used methods for SNP array design can provide reasonable imputation accuracy, pairwise linkage disequilibrium based tag SNP selection algorithm achieves the best performance. Our pipelines for running evaluated algorithms and leave-one-out cross-validation are available for public use at https://github.com/datngu/TagSNP_evaluation.

Keywords: Measurement, Knowledge engineering, Couplings, Sequential analysis, Systematics, Pipelines, Genomics

Bạn đã sẵn sàng khởi đầu hành trình sức khoẻ từ gen cùng GeneStory?

Đăng ký nhận tư vấn

VỀ GIẢI MÃ GENE, CHÍNH SÁCH ƯU ĐÃI