Prediction of biogeographical ancestry from genotype: a comparison of classifiers

被引:20
|
作者
Cheung, Elaine Y. Y. [1 ]
Gahan, Michelle Elizabeth [1 ]
McNevin, Dennis [1 ]
机构
[1] Univ Canberra, Natl Ctr Forens Studies, Fac Educ Sci Technol & Math ESTeM, Bruce, ACT 2601, Australia
关键词
Biogeographical ancestry (BGA); Phenotype prediction; STRUCTURE; Bayesian; Genetic distance; Multinomial logistic regression; DETERMINING CONTINENTAL ORIGIN; GENOME-WIDE PATTERNS; POPULATION-STRUCTURE; INFORMATIVE MARKERS; DIVERSITY; ADMIXTURE; PANEL; ASSAY; AMERICANS; INFERENCE;
D O I
10.1007/s00414-016-1504-3
中图分类号
DF [法律]; D9 [法律]; R [医药、卫生];
学科分类号
0301 ; 10 ;
摘要
DNA can provide forensic intelligence regarding a donor's biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd's AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.
引用
收藏
页码:901 / 912
页数:12
相关论文
共 50 条
  • [41] What Can One Chromosome Tell us About Human Biogeographical Ancestry?
    Toma, Tanjin Taher
    Williams, Zachary
    Dawson, Jeremy
    Adjeroh, Donald
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 188 - 193
  • [42] Biogeographical ancestry is associated with socioenvironmental conditions and infections in a Latin American urban population
    da Silva, Thiago Magalhaes
    Fiaccone, Rosemeire L.
    Kehdy, Fernanda S. G.
    Tarazona-Santos, Eduardo
    Rodrigues, Laura C.
    Costa, Gustavo N. O.
    Figueiredo, Camila A.
    Alcantara-Neves, Neuza Maria
    Barreto, Mauricio L.
    SSM-POPULATION HEALTH, 2018, 4 : 301 - 306
  • [43] The potential forensic utility of two single nucleotide polymorphisms in predicting biogeographical ancestry
    Gu, Yan
    Yun, Libing
    Zhang, Lushun
    Yang, Fan
    Hou, Yiping
    FORENSIC SCIENCE INTERNATIONAL GENETICS SUPPLEMENT SERIES, 2011, 3 (01) : E105 - E106
  • [44] Human ancestry indentification under resource constraints - what can one chromosome tell us about human biogeographical ancestry?
    Toma, Tanjin T.
    Dawson, Jeremy M.
    Adjeroh, Donald A.
    BMC MEDICAL GENOMICS, 2018, 11
  • [45] Human ancestry indentification under resource constraints - What can one chromosome tell us about human biogeographical ancestry?
    Toma T.T.
    Dawson J.M.
    Adjeroh D.A.
    BMC Medical Genomics, 11 (Suppl 5)
  • [46] Local ancestry prediction with PyLAE
    Moshkov, Nikita
    Smetanin, Aleksandr
    Tatarinova, Tatiana, V
    PEERJ, 2021, 9
  • [47] Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers
    Deist, Timo M.
    Dankers, Frank J. W. M.
    Valdes, Gilmer
    Wijsman, Robin
    Hsu, I-Chow
    Oberije, Cary
    Lustberg, Tim
    van Soest, Johan
    Hoebers, Frank
    Jochems, Arthur
    El Naqa, Issam
    Wee, Leonard
    Morin, Olivier
    Raleigh, David R.
    Bots, Wouter
    Kaanders, Johannes H.
    Belderbos, Jose
    Kwint, Margriet
    Solberg, Timothy
    Monshouwer, Rene
    Bussink, Johan
    Dekker, Andre
    Lambin, Philippe
    MEDICAL PHYSICS, 2018, 45 (07) : 3449 - 3459
  • [48] PREDICTION OF PHENOTYPE INFORMATION FROM GENOTYPE DATA
    Yosef, Nir
    Gramm, Jens
    Wang, Qian-Fei
    Noble, William S.
    Karp, Richard M.
    Sharan, Roded
    COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2010, 10 (02) : 99 - 114
  • [49] Multivariate statistical approach and machine learning for the evaluation of biogeographical ancestry inference in the forensic field
    Eugenio Alladio
    Brando Poggiali
    Giulia Cosenza
    Elena Pilli
    Scientific Reports, 12
  • [50] HAART-Associated Dyslipidemia Varies by Biogeographical Ancestry in the Multicenter AIDS Cohort Study
    Nicholaou, Matthew J.
    Martinson, Jeremy J.
    Abraham, Alison G.
    Brown, Todd T.
    Hussain, Shehnaz K.
    Wolinsky, Steven M.
    Kingsley, Lawrence A.
    AIDS RESEARCH AND HUMAN RETROVIRUSES, 2013, 29 (06) : 871 - 879