Biases in machine-learning models of human single-cell data

被引:0
|
作者
Willem, Theresa [1 ,2 ]
Shitov, Vladimir A. [3 ,4 ,5 ,6 ]
Luecken, Malte D. [3 ,4 ,5 ,6 ]
Kilbertus, Niki [2 ,7 ,8 ]
Bauer, Stefan [2 ,7 ,8 ]
Piraud, Marie [2 ]
Buyx, Alena [1 ]
Theis, Fabian J. [2 ,7 ,9 ]
机构
[1] Tech Univ Munich, TUM Sch Med & Hlth, Inst Hist & Eth Med, Munich, Germany
[2] Helmholtz Munich, Munich, Germany
[3] Helmholtz Munich, Inst Computat Biol, Dept Computat Hlth, Munich, Germany
[4] Helmholtz Munich, Comprehens Pneumol Ctr CPC CPC M bioArch, Munich, Germany
[5] Helmholtz Munich, Inst Lung Hlth & Immun LHI, Munich, Germany
[6] German Ctr Lung Res DZL, Munich, Germany
[7] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany
[8] Munich Ctr Machine Learning MCML, Munich, Germany
[9] Tech Univ Munich, Sch Life Sci, Munich, Germany
关键词
GENOMICS; RACISM; RACE;
D O I
10.1038/s41556-025-01619-8
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases.
引用
收藏
页码:384 / 392
页数:9
相关论文
共 50 条
  • [21] Comprehensive review of hydrothermal liquefaction data for use in machine-learning models
    Haarlemmer, Geert
    Matricon, Lucie
    Roubaud, Anne
    BIOFUELS BIOPRODUCTS & BIOREFINING-BIOFPR, 2024, 18 (05): : 1782 - 1798
  • [22] Machine learning and statistical methods for clustering single-cell RNA-sequencing data
    Petegrosso, Raphael
    Li, Zhuliu
    Kuang, Rui
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (04) : 1209 - 1223
  • [23] Machine learning methods for endocrine disrupting potential identification based on single-cell data
    Aghayev, Zahir
    Szafran, Adam T.
    Tran, Anh
    Ganesh, Hari S.
    Stossi, Fabio
    Zhou, Lan
    Mancini, Michael A.
    Pistikopoulos, Efstratios N.
    Beykal, Burcu
    CHEMICAL ENGINEERING SCIENCE, 2023, 281
  • [24] Data Quality Considerations for Petrophysical Machine-Learning Models1
    McDonald, Andrew
    PETROPHYSICS, 2021, 62 (06): : 585 - 613
  • [25] Machine-learning potential of a single pendulum
    Mandal, Swarnendu
    Sinha, Sudeshna
    Shrimali, Manish Dev
    PHYSICAL REVIEW E, 2022, 105 (05)
  • [26] Evaluation of machine learning approaches for cell-type identification from single-cell transcriptomics data
    Huang, Yixuan
    Zhang, Peng
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [27] Identification of Human Cell Cycle Phase Markers Based on Single-Cell RNA-Seq Data by Using Machine Learning Methods
    Huang, FeiMing
    Chen, Lei
    Guo, Wei
    Huang, Tao
    Cai, Yu-dong
    BIOMED RESEARCH INTERNATIONAL, 2022, 2022
  • [28] Profiling intratumoral heterogeneity of bladder cancer subtypes at the single-cell level using machine-learning assisted histopathology.
    van Rhijn, Bas
    Mertens, Laura
    Mayr, Roman
    Bostrom, Peter
    Marques, Mirari
    van Leenders, Geert
    Gotz, Stefanie
    van der Heijden, Michiel
    Jewett, Michael
    Real, Francisco
    Stohr, Robert
    Zlotta, Alexandre
    Eckstein, Markus
    Soorojebally, Yanish
    Burger, Max
    Otto, Wolfgang
    Radvanyi, Francois
    Pouessel, Damien
    van der Kwast, Theo
    Malats, Nuria
    Hartmann, Arndt
    Allory, Yves
    van der Schoot, Deric
    Zwarthoff, Ellen
    Zuiverloon, Tahlita
    CLINICAL CANCER RESEARCH, 2020, 26 (15) : 58 - 59
  • [29] A Machine-Learning Tool Concurrently Models Single Omics and Phenome Data for Functional Subtyping and Personalized Cancer Medicine
    Nyamundanda, Gift
    Eason, Katherine
    Guinney, Justin
    Lord, Christopher J.
    Sadanandam, Anguraj
    CANCERS, 2020, 12 (10) : 1 - 14
  • [30] Hierarchical progressive learning of cell identities in single-cell data
    Michielsen, Lieke
    Reinders, Marcel J. T.
    Mahfouz, Ahmed
    NATURE COMMUNICATIONS, 2021, 12 (01)