Discovering and Overcoming the Bias in Neoantigen Identification by Unified Machine Learning Models

被引：0

作者：

Zhang, Ziting

Wu, Wenxu

Wei, Lei

Wang, Xiaowo ^{[1
]}

机构：

[1] Tsinghua Univ, Minist Educ, Key Lab Bioinformat, Beijing, Peoples R China

来源：

RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024 | 2024年 / 14758卷

关键词：

neoantigen identification; data bias; machine learning; attention mechanism;

D O I：

10.1007/978-1-0716-3989-4_28

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Neoantigens, formed by genetic mutations in tumor cells, are abnormal peptides that can trigger immune responses. Precisely identifying neoantigens from vast mutations is the key to tumor immunotherapy design. There are three main steps in the neoantigen immune process, i.e., binding with MHCs, extracellular presentation, and induction of immunogenicity. Various machine learning methods have been developed to predict the probability of one of the three events, but the overall accuracy of neoantigen identification remains far from satisfactory. To gain a systematic understanding of the key factors of neoantigen identification, we developed a unified transformer-based machine learning framework ImmuBPI that comprised three tasks and achieved state-of-the-art performance. Through cross-task model interpretation, we have discovered an underestimation of data bias for immunogenicity prediction, which has led to skewed discriminatory boundaries of current machine learning models. We designed a mutual information-based debiasing strategy that performed well on mutation variants immunogenicity prediction, a task where current methods fell short. Clustering immunogenic peptides with debiased representations uncovers unique preferences for biophysical properties, such as hydrophobicity and polarity. These observations serve as an important complement to the past understanding that accurately predicting neoantigen is constrained by limited data, highlighting the necessity of bias control. We expect this study will provide novel and insightful perspectives for neoantigen prediction methods and benefit future neoantigen-mediated immunotherapy designs.

引用

页码：348 / 351

页数：4

共 50 条

[41] Evaluation and Mitigation of Racial Bias in Clinical Machine Learning Models: Scoping Review
Huang, Jonathan
Galal, Galal
Etemadi, Mozziyar
Vaidyanathan, Mahesh
JMIR MEDICAL INFORMATICS, 2022, 10 (05)
[42] Machine learning models in people detection and identification: a literature review
Rondon, Carlos Vicente Nino
Casadiego, Sergio Alexander Castro
Chaustre, Yesenia Restrepo
INGENIERIA SOLIDARIA, 2022, 18 (03): : 17 - 23
[43] Automated Cardioailment Identification and Prevention by Hybrid Machine Learning Models
Archana, K. S.
Sivakumar, B.
Kuppusamy, Ramya
Teekaraman, Yuvaraja
Radhakrishnan, Arun
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
[44] Automatic Selection of Machine Learning Models for Armed People Identification
Javier Amado-Garfias, Alonso
Conant-Pablos, Santiago Enrique
Ortiz-Bayliss, Jose Carlos
Terashima-Marin, Hugo
IEEE ACCESS, 2024, 12 : 175952 - 175968
[45] Statistical Machine Learning: A Unified Framework
Liu, Shuangzhe
INTERNATIONAL STATISTICAL REVIEW, 2021, 89 (01) : 210 - 212
[46] Statistical Machine Learning: A Unified Framework
Liu, Shuangzhe
INTERNATIONAL STATISTICAL REVIEW, 2021,
[47] Statistical Machine Learning - A Unified Framework
Liu, Xiao
JOURNAL OF QUALITY TECHNOLOGY, 2022, 54 (05) : 605 - 605
[48] Machine-OIF-Action: a unified framework for developing and interpreting machine-learning models for chemosensory research
Gupta, Anku
Choudhary, Mohit
Mohanty, Sanjay Kumar
Mittal, Aayushi
Gupta, Krishan
Arya, Aditya
Kumar, Suvendu
Katyayan, Nikhil
Dixit, Nilesh Kumar
Kalra, Siddhant
Goel, Manshi
Sahni, Megha
Singhal, Vrinda
Mishra, Tripti
Sengupta, Debarka
Ahuja, Gaurav
BIOINFORMATICS, 2021, 37 (12) : 1769 - 1771
[49] Machine learning methods and harmonized datasets improve immunogenic neoantigen prediction
Muller, Markus
Huber, Florian
Arnaud, Marion
Kraemer, Anne I.
Altimiras, Emma Ricart
Michaux, Justine
Taillandier-Coindard, Marie
Chiffelle, Johanna
Murgues, Baptiste
Gehret, Talita
Auger, Aymeric
Stevenson, Brian J.
Coukos, George
Harari, Alexandre
Bassani-Sternberg, Michal
IMMUNITY, 2023, 56 (11) : 2650 - +
[50] Enhancing Question Pairs Identification with Ensemble Learning: Integrating Machine Learning and Deep Learning Models
Tarek, Salsabil
Noaman, Hatem M.
Kayed, Mohammed
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 981 - 992

← 1 2 3 4 5 →