Discovering and Overcoming the Bias in Neoantigen Identification by Unified Machine Learning Models

被引:0
|
作者
Zhang, Ziting
Wu, Wenxu
Wei, Lei
Wang, Xiaowo [1 ]
机构
[1] Tsinghua Univ, Minist Educ, Key Lab Bioinformat, Beijing, Peoples R China
来源
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024 | 2024年 / 14758卷
关键词
neoantigen identification; data bias; machine learning; attention mechanism;
D O I
10.1007/978-1-0716-3989-4_28
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Neoantigens, formed by genetic mutations in tumor cells, are abnormal peptides that can trigger immune responses. Precisely identifying neoantigens from vast mutations is the key to tumor immunotherapy design. There are three main steps in the neoantigen immune process, i.e., binding with MHCs, extracellular presentation, and induction of immunogenicity. Various machine learning methods have been developed to predict the probability of one of the three events, but the overall accuracy of neoantigen identification remains far from satisfactory. To gain a systematic understanding of the key factors of neoantigen identification, we developed a unified transformer-based machine learning framework ImmuBPI that comprised three tasks and achieved state-of-the-art performance. Through cross-task model interpretation, we have discovered an underestimation of data bias for immunogenicity prediction, which has led to skewed discriminatory boundaries of current machine learning models. We designed a mutual information-based debiasing strategy that performed well on mutation variants immunogenicity prediction, a task where current methods fell short. Clustering immunogenic peptides with debiased representations uncovers unique preferences for biophysical properties, such as hydrophobicity and polarity. These observations serve as an important complement to the past understanding that accurately predicting neoantigen is constrained by limited data, highlighting the necessity of bias control. We expect this study will provide novel and insightful perspectives for neoantigen prediction methods and benefit future neoantigen-mediated immunotherapy designs.
引用
收藏
页码:348 / 351
页数:4
相关论文
共 50 条
  • [21] Discovering invariants via machine learning
    Ha, Seungwoong
    Jeong, Hawoong
    PHYSICAL REVIEW RESEARCH, 2021, 3 (.4):
  • [22] Identification of Phishing URLs Using Machine Learning Models
    Vivek, Meghashyam
    Premjith, Nithin
    Johnson, Aaron Antonio
    Maurya, Ashutosh Kumar
    Jingle, I. Diana Jeba
    FOURTH CONGRESS ON INTELLIGENT SYSTEMS, VOL 3, CIS 2023, 2024, 865 : 209 - 219
  • [23] Inductive Discovery by Machine Learning for Identification of Structural Models
    Maass, Wolfgang
    Shcherbatyi, Iaroslav
    CONCEPTUAL MODELING, ER 2018, 2018, 11157 : 545 - 552
  • [24] Machine learning in silico models in chemical hazard identification
    Wedebye, E. B.
    Nikolov, N. G.
    TOXICOLOGY LETTERS, 2021, 350 : S18 - S18
  • [25] Ensemble Machine Learning Models for Breast Cancer Identification
    Dritsas, Elias
    Trigka, Maria
    Mylonas, Phivos
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS. AIAI 2023 IFIP WG 12.5 INTERNATIONAL WORKSHOPS, 2023, 677 : 303 - 311
  • [26] Evaluating Machine Learning Models for Essential Protein Identification
    Costa, Jessica da Silva
    Rodrigues, Jorge Gabriel
    Belloze, Kele
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, BSB 2022, 2022, 13523 : 38 - 43
  • [27] Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction
    Li, Fuchen
    Wu, Patrick
    Ong, Henry H.
    Peterson, Josh F.
    Wei, Wei-qi
    Zhao, Juan
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 138
  • [28] Machine Learning Models for Edge Placement Error Based Etch Bias
    Meng, Yang
    Kim, Young-Chang
    Guo, Shujie
    Shu, Zhongli
    Zhang, Yingchun
    Liu, Qingwei
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2021, 34 (01) : 42 - 48
  • [29] Machine learning based bias correction for numerical chemical transport models
    Xu, Min
    Jin, Jianbing
    Wang, Guoqiang
    Segers, Arjo
    Deng, Tuo
    Lin, Hai Xiang
    ATMOSPHERIC ENVIRONMENT, 2021, 248
  • [30] Leveraging Feature Bias for Scalable Misprediction Explanation of Machine Learning Models
    Gesi, Jiri
    Shen, Xinyun
    Geng, Yunfan
    Chen, Qihong
    Ahmed, Iftekhar
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 1559 - 1570