Discovering and Overcoming the Bias in Neoantigen Identification by Unified Machine Learning Models

被引:0
|
作者
Zhang, Ziting
Wu, Wenxu
Wei, Lei
Wang, Xiaowo [1 ]
机构
[1] Tsinghua Univ, Minist Educ, Key Lab Bioinformat, Beijing, Peoples R China
来源
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024 | 2024年 / 14758卷
关键词
neoantigen identification; data bias; machine learning; attention mechanism;
D O I
10.1007/978-1-0716-3989-4_28
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Neoantigens, formed by genetic mutations in tumor cells, are abnormal peptides that can trigger immune responses. Precisely identifying neoantigens from vast mutations is the key to tumor immunotherapy design. There are three main steps in the neoantigen immune process, i.e., binding with MHCs, extracellular presentation, and induction of immunogenicity. Various machine learning methods have been developed to predict the probability of one of the three events, but the overall accuracy of neoantigen identification remains far from satisfactory. To gain a systematic understanding of the key factors of neoantigen identification, we developed a unified transformer-based machine learning framework ImmuBPI that comprised three tasks and achieved state-of-the-art performance. Through cross-task model interpretation, we have discovered an underestimation of data bias for immunogenicity prediction, which has led to skewed discriminatory boundaries of current machine learning models. We designed a mutual information-based debiasing strategy that performed well on mutation variants immunogenicity prediction, a task where current methods fell short. Clustering immunogenic peptides with debiased representations uncovers unique preferences for biophysical properties, such as hydrophobicity and polarity. These observations serve as an important complement to the past understanding that accurately predicting neoantigen is constrained by limited data, highlighting the necessity of bias control. We expect this study will provide novel and insightful perspectives for neoantigen prediction methods and benefit future neoantigen-mediated immunotherapy designs.
引用
收藏
页码:348 / 351
页数:4
相关论文
共 50 条
  • [41] Evaluation and Mitigation of Racial Bias in Clinical Machine Learning Models: Scoping Review
    Huang, Jonathan
    Galal, Galal
    Etemadi, Mozziyar
    Vaidyanathan, Mahesh
    JMIR MEDICAL INFORMATICS, 2022, 10 (05)
  • [42] Machine learning models in people detection and identification: a literature review
    Rondon, Carlos Vicente Nino
    Casadiego, Sergio Alexander Castro
    Chaustre, Yesenia Restrepo
    INGENIERIA SOLIDARIA, 2022, 18 (03): : 17 - 23
  • [43] Automated Cardioailment Identification and Prevention by Hybrid Machine Learning Models
    Archana, K. S.
    Sivakumar, B.
    Kuppusamy, Ramya
    Teekaraman, Yuvaraja
    Radhakrishnan, Arun
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
  • [44] Automatic Selection of Machine Learning Models for Armed People Identification
    Javier Amado-Garfias, Alonso
    Conant-Pablos, Santiago Enrique
    Ortiz-Bayliss, Jose Carlos
    Terashima-Marin, Hugo
    IEEE ACCESS, 2024, 12 : 175952 - 175968
  • [45] Statistical Machine Learning: A Unified Framework
    Liu, Shuangzhe
    INTERNATIONAL STATISTICAL REVIEW, 2021, 89 (01) : 210 - 212
  • [46] Statistical Machine Learning: A Unified Framework
    Liu, Shuangzhe
    INTERNATIONAL STATISTICAL REVIEW, 2021,
  • [47] Statistical Machine Learning - A Unified Framework
    Liu, Xiao
    JOURNAL OF QUALITY TECHNOLOGY, 2022, 54 (05) : 605 - 605
  • [48] Machine-OIF-Action: a unified framework for developing and interpreting machine-learning models for chemosensory research
    Gupta, Anku
    Choudhary, Mohit
    Mohanty, Sanjay Kumar
    Mittal, Aayushi
    Gupta, Krishan
    Arya, Aditya
    Kumar, Suvendu
    Katyayan, Nikhil
    Dixit, Nilesh Kumar
    Kalra, Siddhant
    Goel, Manshi
    Sahni, Megha
    Singhal, Vrinda
    Mishra, Tripti
    Sengupta, Debarka
    Ahuja, Gaurav
    BIOINFORMATICS, 2021, 37 (12) : 1769 - 1771
  • [49] Machine learning methods and harmonized datasets improve immunogenic neoantigen prediction
    Muller, Markus
    Huber, Florian
    Arnaud, Marion
    Kraemer, Anne I.
    Altimiras, Emma Ricart
    Michaux, Justine
    Taillandier-Coindard, Marie
    Chiffelle, Johanna
    Murgues, Baptiste
    Gehret, Talita
    Auger, Aymeric
    Stevenson, Brian J.
    Coukos, George
    Harari, Alexandre
    Bassani-Sternberg, Michal
    IMMUNITY, 2023, 56 (11) : 2650 - +
  • [50] Enhancing Question Pairs Identification with Ensemble Learning: Integrating Machine Learning and Deep Learning Models
    Tarek, Salsabil
    Noaman, Hatem M.
    Kayed, Mohammed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 981 - 992