Construction of a diagnostic classifier for cervical intraepithelial neoplasia and cervical cancer based on XGBoost feature selection and random forest model

被引:3
|
作者
Zhang, Jing [1 ]
Yang, Xiuqing [1 ]
Chen, Jia [1 ]
Han, Jing [1 ]
Chen, Xiaofeng [1 ]
Fan, Yueping [1 ]
Zheng, Hui [1 ]
机构
[1] Jiangsu Xiangshui Hosp Chinese Med, Dept Gynaecol & Obstet, 2 Yinhe Rd, Yancheng 224600, Jiangsu, Peoples R China
关键词
cervical cancer; cervical intraepithelial neoplasia; diagnostic markers; PPI network; XGBoost; DIGITAL REPEAT PHOTOGRAPHY; IMAGE TIME-SERIES; CELL-CYCLE; PHENOLOGY; VEGETATION; APOPTOSIS;
D O I
10.1111/jog.15458
中图分类号
R71 [妇产科学];
学科分类号
100211 ;
摘要
Background The pathological phenotype of early-stage cervical cancer (CC) is similar to that of cervical intraepithelial neoplasia (CIN), which provides a challenge for the diagnosis of cervical precancerous lesions. Meanwhile, the existing diagnostic methods have certain subjectivity and limitations, resulting in the possibility of misdiagnosis or missed diagnosis. Hence, some methods are needed to assist diagnosis of CC and CIN. Methods Based on the data of CIN and CC in gene expression omnibus (GEO) dataset, the eXtreme Gradient Boosting (XGBoost) algorithm was used to screen the feature genes between CIN and CC for constructing the classifier. Incremental feature selection (IFS) curve was also used for screening. The classifier was validated for reliability using principal component analysis (PCA) dimensionality reduction analysis and heat map analysis of gene expression. Then, differentially expressed genes of CIN and CC were intersected with the classifier genes. Genes in the intersection were used as seeds for protein-protein interaction network construction and restart random walk analysis. And the genes with the top 50 affinity coefficients were selected for gene ontology (GO) and kyoto encyclopedia of genes and genome (KEGG) enrichment analyses to observe the biological functions with differences between CIN and CC. Results The peripheral blood genes of CIN and CC were analyzed, and seven genes were screened. Using this gene for classifier construction, IFS curve screening revealed that the three-feature gene classifier constructed according to the random forest model had the best effect. The results of PCA dimensionality reduction analysis and gene expression heat map analysis showed that the three-gene classifier could effectively distinguish CIN from CC. Conclusion A three-gene diagnostic classifier can effectively distinguish CIN patients from CC patients and provide a reference for the clinical diagnosis of early CC.
引用
收藏
页码:296 / 303
页数:8
相关论文
共 50 条
  • [31] Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier
    R. Geetha
    S. Sivasubramanian
    M. Kaliappan
    S. Vimal
    Suresh Annamalai
    Journal of Medical Systems, 2019, 43
  • [32] Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier
    Geetha, R.
    Sivasubramanian, S.
    Kaliappan, M.
    Vimal, S.
    Annamalai, Suresh
    JOURNAL OF MEDICAL SYSTEMS, 2019, 43 (09)
  • [33] Liver Cancer Classification Using Random Forest and Extreme Gradient Boosting (XGBoost) with Genetic Algorithm as Feature Selection
    Desdhanty, Vabiyana Safira
    Rustam, Zuherman
    2021 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATION (DASA), 2021,
  • [34] Risk of anogenital cancer after diagnosis of cervical intraepithelial neoplasia:: a prospective population-based study
    Edgren, Gustaf
    Sparen, Par
    LANCET ONCOLOGY, 2007, 8 (04): : 311 - 316
  • [35] A prediction model for spontaneous regression of cervical intraepithelial neoplasia grade 2, based on simple clinical parameters
    Koeneman, Margot M.
    van Lint, Freyja H. M.
    van Kuijk, Sander M. J.
    Smits, Luc J. M.
    Kooreman, Loes F. S.
    Kruitwagen, Roy F. P. M.
    Kruse, Arnold J.
    HUMAN PATHOLOGY, 2017, 59 : 62 - 69
  • [36] A PREDICTION MODEL FOR SPONTANEOUS REGRESSION OF CERVICAL INTRAEPITHELIAL NEOPLASIA GRADE 2, BASED ON SIMPLE CLINICAL PARAMETERS
    Koeneman, M. M.
    van Lint, F. H.
    van Kuijk, S. M.
    Smits, L. J.
    Kooreman, L. F.
    Kruitwagen, R. F.
    Kruse, A. J.
    INTERNATIONAL JOURNAL OF GYNECOLOGICAL CANCER, 2016, 26 : 365 - 366
  • [37] The Risk of Cervical Cancer After Cervical Intraepithelial Neoplasia Grade 3: A Population-Based Cohort Study With 80,442 Women
    Loopik, Diede L.
    IntHout, Joanna
    Ebisch, Renee M. F.
    Melchers, Willem J. G.
    Massuger, Leon F. A. G.
    Siebers, Albert G.
    Bekkers, Ruud L. M.
    OBSTETRICAL & GYNECOLOGICAL SURVEY, 2020, 75 (06) : 351 - 352
  • [38] Risk of cervical cancer after completed post-treatment follow-up of cervical intraepithelial neoplasia: population based cohort study
    Rebolj, Matejka
    Helmerhorst, Theo
    Habbema, Dik
    Looman, Caspar
    Boer, Rob
    van Rosmalen, Joost
    van Ballegooijen, Marjolein
    BMJ-BRITISH MEDICAL JOURNAL, 2012, 345
  • [39] The risk of cervical cancer after cervical intraepithelial neoplasia grade 3: A population-based cohort study with 80,442 women
    Loopik, Diede L.
    IntHout, Joanna
    Ebisch, Renee M. F.
    Melchers, Willem J. G.
    Massuger, Leon F. A. G.
    Siebers, Albert G.
    Bekkers, Ruud L. M.
    GYNECOLOGIC ONCOLOGY, 2020, 157 (01) : 195 - 201
  • [40] Effect of the HPV vaccination programme on incidence of cervical cancer and grade 3 cervical intraepithelial neoplasia by socioeconomic deprivation in England: population based observational study
    Falcaro, Milena
    Soldan, Kate
    Ndlela, Busani
    Sasieni, Peter
    BMJ-BRITISH MEDICAL JOURNAL, 2024, 385