Category encoding method to select feature genes for the classification of bulk and single-cell RNA-seq data

被引:2
|
作者
Zhou, Yan [1 ]
Zhang, Li [1 ]
Xu, Jinfeng [2 ]
Zhang, Jun [1 ]
Yan, Xiaodong [3 ]
机构
[1] Shenzhen Univ, Coll Math & Stat, Inst Stat Sci, Shenzhen Key Lab Adv Machine Learning & Applicat, Shenzhen, Peoples R China
[2] Univ Hong Kong, Dept Math, Pokfulam, Hong Kong, Peoples R China
[3] Shandong Univ, Zhongtai Secur Inst Financial Studies, Jinan, Peoples R China
基金
中国国家自然科学基金;
关键词
CAEN; classification; feature selection; single‐ cell RNA‐ seq; DISCRIMINANT-ANALYSIS; NORMALIZATION; FILTER; MODEL;
D O I
10.1002/sim.9015
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Bulk and single-cell RNA-seq (scRNA-seq) data are being used as alternatives to traditional technology in biology and medicine research. These data are used, for example, for the detection of differentially expressed (DE) genes. Several statistical methods have been developed for the classification of bulk and single-cell RNA-seq data. These feature genes are vitally important for the classification of bulk and single-cell RNA-seq data. The majority of genes are not DE and they are thus irrelevant for class distinction. To improve the classification performance and save the computation time, removal of irrelevant genes is necessary. Removal will aid the detection of the important feature genes. Widely used schemes in the literature, such as the BSS/WSS (BW) method, assume that data are normally distributed and may not be suitable for bulk and single-cell RNA-seq data. In this article, a category encoding (CAEN) method is proposed to select feature genes for bulk and single-cell RNA-seq data classification. This novel method encodes categories by employing the rank of sequence samples for each gene in each class. Correlation coefficients are considered for gene and class with the rank of sample and a new rank of category. The highest gene correlation coefficients are considered feature genes, which are the most effective for classifying bulk and single-cell RNA-seq dataset. The sure screening method was also established for rank consistency properties of the proposed CAEN method. Simulation studies show that the classifier using the proposed CAEN method performs better than, or at least as well as, the existing methods in most settings. Existing real datasets were analyzed, with the results demonstrating superior performance of the proposed method over current competitors. The application has been coded into an R package named "CAEN" to facilitate wide use.
引用
收藏
页码:4077 / 4089
页数:13
相关论文
共 50 条
  • [31] NDRindex: a method for the quality assessment of single-cell RNA-Seq preprocessing data
    Xiao, Ruiyu
    Lu, Guoshan
    Guo, Wanqian
    Jin, Shuilin
    BMC BIOINFORMATICS, 2020, 21 (Suppl 16)
  • [32] An accurate and robust imputation method scImpute for single-cell RNA-seq data
    Wei Vivian Li
    Jingyi Jessica Li
    Nature Communications, 9
  • [33] NDRindex: a method for the quality assessment of single-cell RNA-Seq preprocessing data
    Ruiyu Xiao
    Guoshan Lu
    Wanqian Guo
    Shuilin Jin
    BMC Bioinformatics, 21
  • [34] scDFC: A deep fusion clustering method for single-cell RNA-seq data
    Hu, Dayu
    Liang, Ke
    Zhou, Sihang
    Tu, Wenxuan
    Liu, Meng
    Liu, Xinwang
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (04)
  • [35] NDRindex: A method for the quality assessment of single-cell RNA-Seq preprocessing data
    Xiao, Ruiyu
    Lu, Guoshan
    Jin, Shuilin
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1792 - 1800
  • [36] Integrating Single-Cell RNA-Seq and Bulk RNA-Seq Data to Explore the Key Role of Fatty Acid Metabolism in Breast Cancer
    Chen, Yongxing
    Wu, Wei
    Jin, Chenxin
    Cui, Jiaxue
    Diao, Yizhuo
    Wang, Ruiqi
    Xu, Rongxuan
    Yao, Zhihan
    Li, Xiaofeng
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2023, 24 (17)
  • [37] Integrating single-cell RNA-Seq and bulk RNA-Seq data to explore the key role of fatty acid metabolism in hepatocellular carcinoma
    Dai, Hua
    Tao, Xin
    Shu, Yuansen
    Liu, Fanrong
    Cheng, Xiaoping
    Li, Xiushen
    Shu, Bairui
    Luo, Hongcheng
    Chen, Xuxiang
    Cheng, Zhaorui
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [38] An accurate and robust imputation method scImpute for single-cell RNA-seq data
    Li, Wei Vivian
    Li, Jingyi Jessica
    NATURE COMMUNICATIONS, 2018, 9
  • [39] Complementing single-cell RNA-seq using bulk transcriptional profiles
    Haynes, Winston A.
    Vallania, Francesco
    Khatri, Purvesh
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1446 - 1450
  • [40] A novel method for predicting cell abundance based on single-cell RNA-seq data
    Peng, Jiajie
    Han, Lu
    Shang, Xuequn
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 9)