scAnno: a deconvolution strategy-based automatic cell type annotation tool for single-cell RNA-sequencing data sets

被引:6
|
作者
Liu, Hongjia [1 ]
Li, Huamei [2 ]
Sharma, Amit [3 ]
Huang, Wenjuan [4 ]
Pan, Duo [1 ]
Gu, Yu [1 ]
Lin, Lu [1 ]
Sun, Xiao [1 ]
Liu, Hongde [1 ]
机构
[1] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Digital Med Engn, Nanjing, Peoples R China
[2] Nanjing Univ, Nanjing Drum Tower Hosp, Affiliated Hosp, Med Sch,Dept Gen Surg, Nanjing, Peoples R China
[3] Univ Hosp Bonn, Bonn, Germany
[4] Southeast Univ, Southeast Univ Hosp, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
scRNA-seq data annotation; deconvolution; logistic regression; cell type-specific genes;
D O I
10.1093/bib/bbad179
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Undoubtedly, single-cell RNA sequencing (scRNA-seq) has changed the research landscape by providing insights into heterogeneous, complex and rare cell populations. Given that more such data sets will become available in the near future, their accurate assessment with compatible and robust models for cell type annotation is a prerequisite. Considering this, herein, we developed scAnno (scRNA-seq data annotation), an automated annotation tool for scRNA-seq data sets primarily based on the single-cell cluster levels, using a joint deconvolution strategy and logistic regression. We explicitly constructed a reference profile for human (30 cell types and 50 human tissues) and a reference profile for mouse (26 cell types and 50 mouse tissues) to support this novel methodology (scAnno). scAnno offers a possibility to obtain genes with high expression and specificity in a given cell type as cell type-specific genes (marker genes) by combining co-expression genes with seed genes as a core. Of importance, scAnno can accurately identify cell type-specific genes based on cell type reference expression profiles without any prior information. Particularly, in the peripheral blood mononuclear cell data set, the marker genes identified by scAnno showed cell type-specific expression, and the majority of marker genes matched exactly with those included in the CellMarker database. Besides validating the flexibility and interpretability of scAnno in identifying marker genes, we also proved its superiority in cell type annotation over other cell type annotation tools (SingleR, scPred, CHETAH and scmap-cluster) through internal validation of data sets (average annotation accuracy: 99.05%) and cross-platform data sets (average annotation accuracy: 95.56%). Taken together, we established the first novel methodology that utilizes a deconvolution strategy for automated cell typing and is capable of being a significant application in broader scRNA-seq analysis. scAnno is available at .
引用
收藏
页数:12
相关论文
共 50 条
  • [41] scds: computational annotation of doublets in single-cell RNA sequencing data
    Bais, Abha S.
    Kostka, Dennis
    BIOINFORMATICS, 2020, 36 (04) : 1150 - 1158
  • [42] X Software Benchmark-Classification Tree Algorithms for Cell Atlases Annotation Using Single-Cell RNA-Sequencing Data
    Alaqeeli, Omar
    Xing, Li
    Zhang, Xuekui
    MICROBIOLOGY RESEARCH, 2021, 12 (02) : 317 - 334
  • [43] scAnnotatR: framework to accurately classify cell types in single-cell RNA-sequencing data
    Vy Nguyen
    Johannes Griss
    BMC Bioinformatics, 23
  • [44] scAnnotatR: framework to accurately classify cell types in single-cell RNA-sequencing data
    Nguyen, Vy
    Griss, Johannes
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [45] A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data
    Barron, Martin
    Zhang, Siyuan
    Li, Jun
    NUCLEIC ACIDS RESEARCH, 2018, 46 (03)
  • [46] CALLR: a semi-supervised cell-type annotation method for single-cell RNA sequencing data
    Wei, Ziyang
    Zhang, Shuqin
    BIOINFORMATICS, 2021, 37 : I51 - I58
  • [47] A comparison of automatic cell identification methods for single-cell RNA sequencing data
    Abdelaal, Tamim
    Michielsen, Lieke
    Cats, Davy
    Hoogduin, Dylan
    Mei, Hailiang
    Reinders, Marcel J. T.
    Mahfouz, Ahmed
    GENOME BIOLOGY, 2019, 20 (01)
  • [48] A comparison of automatic cell identification methods for single-cell RNA sequencing data
    Tamim Abdelaal
    Lieke Michielsen
    Davy Cats
    Dylan Hoogduin
    Hailiang Mei
    Marcel J. T. Reinders
    Ahmed Mahfouz
    Genome Biology, 20
  • [49] Integration for single-cell RNA sequencing data based on the shared cell type assignment
    Zhang, Yin
    Wang, Fei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 232 - 235
  • [50] Quantitative assessment of single-cell RNA-sequencing methods
    Angela R Wu
    Norma F Neff
    Tomer Kalisky
    Piero Dalerba
    Barbara Treutlein
    Michael E Rothenberg
    Francis M Mburu
    Gary L Mantalas
    Sopheak Sim
    Michael F Clarke
    Stephen R Quake
    Nature Methods, 2014, 11 : 41 - 46