Construction of a diagnostic classifier for cervical intraepithelial neoplasia and cervical cancer based on XGBoost feature selection and random forest model

被引:3
|
作者
Zhang, Jing [1 ]
Yang, Xiuqing [1 ]
Chen, Jia [1 ]
Han, Jing [1 ]
Chen, Xiaofeng [1 ]
Fan, Yueping [1 ]
Zheng, Hui [1 ]
机构
[1] Jiangsu Xiangshui Hosp Chinese Med, Dept Gynaecol & Obstet, 2 Yinhe Rd, Yancheng 224600, Jiangsu, Peoples R China
关键词
cervical cancer; cervical intraepithelial neoplasia; diagnostic markers; PPI network; XGBoost; DIGITAL REPEAT PHOTOGRAPHY; IMAGE TIME-SERIES; CELL-CYCLE; PHENOLOGY; VEGETATION; APOPTOSIS;
D O I
10.1111/jog.15458
中图分类号
R71 [妇产科学];
学科分类号
100211 ;
摘要
Background The pathological phenotype of early-stage cervical cancer (CC) is similar to that of cervical intraepithelial neoplasia (CIN), which provides a challenge for the diagnosis of cervical precancerous lesions. Meanwhile, the existing diagnostic methods have certain subjectivity and limitations, resulting in the possibility of misdiagnosis or missed diagnosis. Hence, some methods are needed to assist diagnosis of CC and CIN. Methods Based on the data of CIN and CC in gene expression omnibus (GEO) dataset, the eXtreme Gradient Boosting (XGBoost) algorithm was used to screen the feature genes between CIN and CC for constructing the classifier. Incremental feature selection (IFS) curve was also used for screening. The classifier was validated for reliability using principal component analysis (PCA) dimensionality reduction analysis and heat map analysis of gene expression. Then, differentially expressed genes of CIN and CC were intersected with the classifier genes. Genes in the intersection were used as seeds for protein-protein interaction network construction and restart random walk analysis. And the genes with the top 50 affinity coefficients were selected for gene ontology (GO) and kyoto encyclopedia of genes and genome (KEGG) enrichment analyses to observe the biological functions with differences between CIN and CC. Results The peripheral blood genes of CIN and CC were analyzed, and seven genes were screened. Using this gene for classifier construction, IFS curve screening revealed that the three-feature gene classifier constructed according to the random forest model had the best effect. The results of PCA dimensionality reduction analysis and gene expression heat map analysis showed that the three-gene classifier could effectively distinguish CIN from CC. Conclusion A three-gene diagnostic classifier can effectively distinguish CIN patients from CC patients and provide a reference for the clinical diagnosis of early CC.
引用
收藏
页码:296 / 303
页数:8
相关论文
共 50 条
  • [1] Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques
    Abdoh, Sherif F.
    Rizka, Mohamed Abo
    Maghraby, Fahima A.
    IEEE ACCESS, 2018, 6 : 59475 - 59485
  • [2] A Risk Prediction Model for Type 2 Diabetes Based on Weighted Feature Selection of Random Forest and XGBoost Ensemble Classifier
    Xu, Zhongxian
    Wang, Zhiliang
    2019 ELEVENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI 2019), 2019, : 278 - 283
  • [3] A lectin-based diagnostic system using circulating antibodies to detect cervical intraepithelial neoplasia and cervical cancer
    Jin, Yingji
    Kim, Seung Cheol
    Kim, Hyoung Jin
    Ju, Woong
    Kim, Yun Hwan
    Kim, Hong-Jin
    GLYCOBIOLOGY, 2016, 26 (01) : 100 - 107
  • [4] Diagnostic imaging of cervical intraepithelial neoplasia based on hematoxylin and eosin fluorescence
    Mario R. Castellanos
    Anita Szerszen
    Stephen Gundry
    Edyta C. Pirog
    Mitchell Maiman
    Sritha Rajupet
    John Paul Gomez
    Adi Davidov
    Priya Ranjan Debata
    Probal Banerjee
    Jimmie E. Fata
    Diagnostic Pathology, 10
  • [5] Diagnostic imaging of cervical intraepithelial neoplasia based on hematoxylin and eosin fluorescence
    Castellanos, Mario R.
    Szerszen, Anita
    Gundry, Stephen
    Pirog, Edyta C.
    Maiman, Mitchell
    Rajupet, Sritha
    Gomez, John Paul
    Davidov, Adi
    Debata, Priya Ranjan
    Banerjee, Probal
    Fata, Jimmie E.
    DIAGNOSTIC PATHOLOGY, 2015, 10
  • [6] Pretreatment plasma levels and diagnostic utility of hematopoietic cytokines in cervical cancer or cervical intraepithelial neoplasia patients
    Lawicki, Slawomir
    Bedkowska, Grazyna E.
    Gacuta-Szumarska, Ewa
    Knapp, Pawel
    Szmitkowski, Maciej
    FOLIA HISTOCHEMICA ET CYTOBIOLOGICA, 2012, 50 (02) : 213 - 219
  • [7] Cytology-based screening for anal intraepithelial neoplasia in women with a history of cervical intraepithelial neoplasia or cancer
    Wohlmuth, Christoph
    Ghorab, Zeina
    Shier, Michael
    Tinmouth, Jill
    Salit, Irving E.
    Covens, Allan
    Zhang, Liying
    Vicus, Danielle
    CANCER CYTOPATHOLOGY, 2021, 129 (02) : 140 - 147
  • [8] Computer Aided Screening of Cervical Cancer Using Random Forest Classifier
    Sukumar, P.
    Gnanamurthy, R. K.
    RESEARCH JOURNAL OF PHARMACEUTICAL BIOLOGICAL AND CHEMICAL SCIENCES, 2016, 7 (01): : 1521 - 1529
  • [9] Hidden AS link prediction based on random forest feature selection and GWO-XGBoost model
    Wang, Zekang
    Yuan, Fuxiang
    Li, Ruixiang
    Zhang, Meng
    Luo, Xiangyang
    COMPUTER NETWORKS, 2025, 262
  • [10] Diagnostic value of colposcopy for cervical intraepithelial neoplasia 2-3/carcinoma in situ and microinvasive cervical cancer
    Korolenkova, Luibov L.
    Stilidi, Ivan S.
    Lazareva, Inna N.
    EUROPEAN JOURNAL OF GYNAECOLOGICAL ONCOLOGY, 2021, 42 (05) : 909 - 916