Predicting genome-wide redundancy using machine learning

被引:7
|
作者
Chen, Huang-Wen [2 ]
Bandyopadhyay, Sunayan [2 ,3 ]
Shasha, Dennis E. [2 ]
Birnbaum, Kenneth D. [1 ]
机构
[1] NYU, Dept Biol, Ctr Genom & Syst Biol, New York, NY 10003 USA
[2] NYU, Dept Comp Sci, Courant Inst Math Sci, New York, NY 10003 USA
[3] Univ Minnesota Twin Cities, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
来源
BMC EVOLUTIONARY BIOLOGY | 2010年 / 10卷
关键词
GENE-EXPRESSION MAP; ARABIDOPSIS ROOT; SACCHAROMYCES-CEREVISIAE; DUPLICATE GENES; PHENOTYPE; NETWORKS; EVOLUTION; BIOLOGY; BIOINFORMATICS; PRESERVATION;
D O I
10.1186/1471-2148-10-357
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. Results: Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e. g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods. Conclusions: Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Revisiting genome-wide association studies from statistical modelling to machine learning
    Sun, Shanwen
    Dong, Benzhi
    Zou, Quan
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
  • [22] Genome-wide association studies of ischemic stroke based on interpretable machine learning
    Nikoli, Stefan
    Ignatov, Dmitry I.
    Khvorykh, Gennady, V
    Limborska, Svetlana A.
    Khrunin, Andrey, V
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [23] Valid inference for machine learning-assisted genome-wide association studies
    Miao, Jiacheng
    Wu, Yixuan
    Sun, Zhongxuan
    Miao, Xinran
    Lu, Tianyuan
    Zhao, Jiwei
    Lu, Qiongshi
    NATURE GENETICS, 2024, : 2361 - 2369
  • [24] Redundancy and rewiring of genetic networks following genome-wide duplication events
    De Smet, Riet
    Van de Peer, Yves
    CURRENT OPINION IN PLANT BIOLOGY, 2012, 15 (02) : 168 - 176
  • [25] DeepCGP: A Deep Learning Method to Compress Genome-Wide Polymorphisms for Predicting Phenotype of Rice
    Islam, Tanzila
    Kim, Chyon Hae
    Iwata, Hiroyoshi
    Hiroyuki, Shimono
    Kimura, Akio
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (03) : 2078 - 2088
  • [26] Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data
    Alberto Romagnoni
    Simon Jégou
    Kristel Van Steen
    Gilles Wainrib
    Jean-Pierre Hugot
    Scientific Reports, 9
  • [27] Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning
    Xie, Li
    Xie, Lei
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (08)
  • [28] Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data
    Romagnoni, Alberto
    Jegou, Simon
    Van Steen, Kristel
    Wainrib, Gilles
    Hugot, Jean-Pierre
    Peyrin-Biroulet, Laurent
    Chamaillard, Mathias
    Colombel, Jean-Frederick
    Cottone, Mario
    D'Amato, Mauro
    D'Inca, Renata
    Halfvarson, Jonas
    Henderson, Paul
    Karban, Amir
    Kennedy, Nicholas A.
    Khan, Mohammed Azam
    Lemann, Marc
    Levine, Arie
    Massey, Dunecan
    Milla, Monica
    Ng, Sok Meng Evelyn
    Oikonomou, Ioannis
    Peeters, Harald
    Proctor, Deborah D.
    Rahier, Jean-Francois
    Rutgeerts, Paul
    Seibold, Frank
    Stronati, Laura
    Taylor, Kirstin M.
    Torkvist, Leif
    Ublick, Kullak
    Van Limbergen, Johan
    Van Gossum, Andre
    Vatn, Morten H.
    Zhang, Hu
    Zhang, Wei
    Andrews, Jane M.
    Bampton, Peter A.
    Barclay, Murray
    Florin, Timothy H.
    Gearry, Richard
    Krishnaprasad, Krupa
    Lawrance, Ian C.
    Mahy, Gillian
    Montgomery, Grant W.
    Radford-Smith, Graham
    Roberts, Rebecca L.
    Simms, Lisa A.
    Hanigan, Katherine
    Croft, Anthony
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [29] Explainable Machine Learning Model for Alzheimer Detection Using Genetic Data: A Genome-Wide Association Study Approach
    Khater, Tarek
    Ansari, Sam
    Saad Alatrany, Abbas
    Alaskar, Haya
    Mahmoud, Soliman
    Turky, Ayad
    Tawfik, Hissam
    Almajali, Eqab
    Hussain, Abir
    IEEE ACCESS, 2024, 12 : 95091 - 95105
  • [30] Prediction of Breast Cancer Treatment-Induced Fatigue by Machine Learning Using Genome-Wide Association Data
    Lee, Sangkyu
    Deasy, Joseph O.
    Oh, Jung Hun
    Di Meglio, Antonio
    Dumas, Agnes
    Menvielle, Gwenn
    Charles, Cecile
    Boyault, Sandrine
    Rousseau, Marina
    Besse, Celine
    Thomas, Emilie
    Boland, Anne
    Cottu, Paul
    Tredan, Olivier
    Levy, Christelle
    Martin, Anne-Laure
    Everhard, Sibille
    Ganz, Patricia A.
    Partridge, Ann H.
    Michiels, Stefan
    Deleuze, Jean-Francois
    Andre, Fabrice
    Vaz-Luis, Ines
    JNCI CANCER SPECTRUM, 2020, 4 (05)