Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach

被引:2
|
作者
Braghetto, Anna [1 ,2 ]
Orlandini, Enzo [1 ,2 ]
Baiesi, Marco [1 ,2 ]
机构
[1] Univ Padua, Dept Phys & Astron, Via Marzolo 8, I-35131 Padua, Italy
[2] INFN, Sez Padova, Via Marzolo 8, I-35131 Padua, Italy
关键词
SECONDARY STRUCTURE; POLAR; PREDICTION; DESIGN;
D O I
10.1021/acs.jctc.3c00383
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Explainable and interpretable unsupervised machine learninghelpsone to understand the underlying structure of data. We introduce anensemble analysis of machine learning models to consolidate theirinterpretation. Its application shows that restricted Boltzmann machinescompress consistently into a few bits the information stored in asequence of five amino acids at the start or end of & alpha;-helicesor & beta;-sheets. The weights learned by the machines reveal unexpectedproperties of the amino acids and the secondary structure of proteins:(i) His and Thr have a negligible contribution to the amphiphilicpattern of & alpha;-helices; (ii) there is a class of & alpha;-helicesparticularly rich in Ala at their end; (iii) Pro occupies most oftenslots otherwise occupied by polar or charged amino acids, and itspresence at the start of helices is relevant; (iv) Glu and especiallyAsp on one side and Val, Leu, Iso, and Phe on the other display thestrongest tendency to mark amphiphilic patterns, i.e., extreme valuesof an effective hydrophobicity, though they are notthe most powerful (non)hydrophobic amino acids.
引用
收藏
页码:6011 / 6022
页数:12
相关论文
共 50 条
  • [41] Accurate prediction of essential proteins using ensemble machine learning
    Lu, Dezhi
    Wu, Hao
    Hou, Yutong
    Wu, Yuncheng
    Liu, Yuanyuan
    Wang, Jinwu
    CHINESE PHYSICS B, 2025, 34 (01)
  • [42] Exploring passengers' choice of transfer city in air-to-rail intermodal travel using an interpretable ensemble machine learning approach
    Ren, Yifeng
    Yang, Min
    Chen, Enhui
    Cheng, Long
    Yuan, Yalong
    TRANSPORTATION, 2024, 51 (04) : 1493 - 1523
  • [43] A stacked ensemble machine learning approach for the prediction of diabetes
    Oliullah, Khondokar
    Rasel, Mahedi Hasan
    Islam, Md. Manzurul
    Islam, Md. Reazul
    Wadud, Md. Anwar Hussen
    Whaiduzzaman, Md.
    JOURNAL OF DIABETES AND METABOLIC DISORDERS, 2024, 23 (01) : 603 - 617
  • [44] An Improved Ensemble Machine Learning Approach for Diabetes Diagnosis
    Rashid, Mohanad Mohammed
    Yaseen, Omar Mahmood
    Saeed, Rana Riyadh
    Alasaady, Maher Talal
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2024, 32 (03): : 1335 - 1350
  • [45] A novel ensemble and composite approach for classifying proteins based on Chou's pseudo amino acid composition
    Lin, Jie
    Wang, Yan
    Xu, Xu
    AFRICAN JOURNAL OF BIOTECHNOLOGY, 2011, 10 (74): : 16963 - 16968
  • [46] Towards expert-machine collaborations for technology valuation: An interpretable machine learning approach
    Kim, Juram
    Lee, Gyumin
    Lee, Seungbin
    Lee, Changyong
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2022, 183
  • [47] A Machine Learning Approach for Statistical Software Testing
    Baskiotis, Nicolas
    Sebag, Michele
    Gaudel, Marie-Claude
    Gouraud, Sandrine
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2274 - 2279
  • [48] A statistical and machine learning approach to the study of astrochemistry
    Heyl, Johannes
    Viti, Serena
    Vermarien, Gijs
    FARADAY DISCUSSIONS, 2023, 245 (00) : 569 - 585
  • [49] A Machine Learning Based Approach to Detect Machine Learning Design Patterns
    Pan, Weitao
    Washizaki, Hironori
    Yoshioka, Nobukazu
    Fukazawa, Yoshiaki
    Khomh, Foutse
    Gueheneuc, Yann-Gael
    PROCEEDINGS OF THE 2023 30TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, APSEC 2023, 2023, : 574 - 578
  • [50] A statistical analytical approach to predict the secondary structure of proteins from amino acid sequence information
    Tiwari, S
    Reddy, BVB
    THEORETICAL CHEMISTRY ACCOUNTS, 1999, 101 (1-3) : 41 - 45