Predicting genome-wide redundancy using machine learning

被引:7
|
作者
Chen, Huang-Wen [2 ]
Bandyopadhyay, Sunayan [2 ,3 ]
Shasha, Dennis E. [2 ]
Birnbaum, Kenneth D. [1 ]
机构
[1] NYU, Dept Biol, Ctr Genom & Syst Biol, New York, NY 10003 USA
[2] NYU, Dept Comp Sci, Courant Inst Math Sci, New York, NY 10003 USA
[3] Univ Minnesota Twin Cities, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
来源
BMC EVOLUTIONARY BIOLOGY | 2010年 / 10卷
关键词
GENE-EXPRESSION MAP; ARABIDOPSIS ROOT; SACCHAROMYCES-CEREVISIAE; DUPLICATE GENES; PHENOTYPE; NETWORKS; EVOLUTION; BIOLOGY; BIOINFORMATICS; PRESERVATION;
D O I
10.1186/1471-2148-10-357
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. Results: Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e. g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods. Conclusions: Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Predicting genome-wide redundancy using machine learning
    Huang-Wen Chen
    Sunayan Bandyopadhyay
    Dennis E Shasha
    Kenneth D Birnbaum
    BMC Evolutionary Biology, 10
  • [2] Predicting Genitourinary Toxicity by Machine Learning on Genome-Wide Single Nucleotide Polymorphisms
    Lee, S.
    Oh, J.
    Kerns, S.
    Rosenstein, B.
    Ostrer, H.
    Deasy, J.
    RADIOTHERAPY AND ONCOLOGY, 2018, 127 : S297 - S298
  • [3] Machine Learning in Genome-Wide Association Studies
    Szymczak, Silke
    Biernacka, Joanna M.
    Cordell, Heather J.
    Gonzalez-Recio, Oscar
    Koenig, Inke R.
    Zhang, Heping
    Sun, Yan V.
    GENETIC EPIDEMIOLOGY, 2009, 33 : S51 - S57
  • [4] DEVELOPING AND EVALUATING A STANDARDIZED, MACHINE LEARNING WORKFLOW FOR PREDICTING PSYCHIATRIC PHENOTYPES USING GENOME-WIDE AND CLINICAL DATA
    Marshe, Victoria
    Hauschild, Anne-Christin
    Maciukiewicz, Malgorzata
    Mueller, Daniel
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2019, 29 : S228 - S229
  • [5] Machine learning approaches to genome-wide association studies
    Enoma, David O.
    Bishung, Janet
    Abiodun, Theresa
    Ogunlana, Olubanke
    Osamor, Victor Chukwudi
    JOURNAL OF KING SAUD UNIVERSITY SCIENCE, 2022, 34 (04)
  • [6] Editorial: Machine Learning in Genome-Wide Association Studies
    Hu, Ting
    Darabos, Christian
    Urbanowicz, Ryan
    FRONTIERS IN GENETICS, 2020, 11
  • [7] Genome-wide prediction of discrete traits using bayesian regressions and machine learning
    Oscar González-Recio
    Selma Forni
    Genetics Selection Evolution, 43
  • [8] Genome-wide prediction of discrete traits using bayesian regressions and machine learning
    Gonzalez-Recio, Oscar
    Forni, Selma
    GENETICS SELECTION EVOLUTION, 2011, 43
  • [9] Leveraging machine learning to advance genome-wide association studies
    Dagasso, Gabrielle
    Yan, Yan
    Wang, Lipu
    Li, Longhai
    Kutcher, Randy
    Zhang, Wentao
    Jin, Lingling
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2021, 25 (1-2) : 17 - 36
  • [10] Machine Learning to Advance Human Genome-Wide Association Studies
    Sigala, Rafaella E.
    Lagou, Vasiliki
    Shmeliov, Aleksey
    Atito, Sara
    Kouchaki, Samaneh
    Awais, Muhammad
    Prokopenko, Inga
    Mahdi, Adam
    Demirkan, Ayse
    GENES, 2024, 15 (01)