Selecting Training Samples for Ovarian Cancer Classification via a Semi-supervised Clustering Approach

被引:0
|
作者
Salguero, Jennifer L. [1 ]
Prasanna, Prateek [2 ]
Corredor, German [3 ,4 ]
Cruz-Roa, Angel [5 ,6 ]
Becerra, David [1 ]
Romero, Eduardo [1 ]
机构
[1] Univ Nacl Colombia, Cimalab Res Grp, Bogota, Colombia
[2] SUNY Stony Brook, Stony Brook, NY 11794 USA
[3] Case Western Reserve Univ, Cleveland, OH 44106 USA
[4] Louis Stokes VA Med Ctr, Cleveland, OH USA
[5] Univ Los Llanos, AdaLab, Villavicencio, Colombia
[6] Univ Los Llanos, GITECX, Villavicencio, Colombia
来源
MEDICAL IMAGING 2022: DIGITAL AND COMPUTATIONAL PATHOLOGY | 2022年 / 12039卷
关键词
Pathologist navigation; Decision Support; Probabilistic Latent Semantic Analysis; Serous ovarian Cancer;
D O I
10.1117/12.2612984
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning techniques have shown great promise in digital pathology. However, a major bottleneck is the difficulty of annotating necessary amount of tissue to deal with several variability factors, namely chemical fixation, sample slicing, or staining. Usually, models are trained using sets of annotated small image patches, but then, the number of required patches may increase exponentially and yet they must represent such variability. This paper presents a method for automatic sample selection to train a classifier for ovarian cancer by integrating a novel soft clustering strategy. The method starts by classifying a large set of patches with a previously trained classifier and divide patches from the cancer class as highly and moderately confident. An unsupervised selection of moderately confident patches by a Probabilistic Latent Semantic Analysis (PLSA), picks samples from relevant and meaningful groups with maximum within-group variance. A new model is re-trained using the highly confident patches together with patches obtained from the associated PLSA. This strategy outperforms a model trained with a larger set of annotated patches while the training times and the number of samples are much more smaller. The strategy was evaluated in a set of patches from 18 patients with Serous Ovarian Cancer, obtaining a reduction of 54.62% in the training time and 73.66% in the number of samples, while recall rate improved from 0.69 to 0.73.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] A HYBRID APPROACH TO SELECTING INFORMATIVE CONSTRAINTS FOR SEMI-SUPERVISED CLUSTERING
    Ni, Xianhua
    Yang, Yan
    UNCERTAINTY MODELING IN KNOWLEDGE ENGINEERING AND DECISION MAKING, 2012, 7 : 833 - 838
  • [2] A genetic semi-supervised fuzzy clustering approach to text classification
    Liu, H
    Huang, ST
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2003, 2762 : 173 - 180
  • [3] TESC: An approach to TExt classification using Semi-supervised Clustering
    Zhang, Wen
    Tang, Xijin
    Yoshida, Taketoshi
    KNOWLEDGE-BASED SYSTEMS, 2015, 75 : 152 - 160
  • [4] Spectral clustering: A semi-supervised approach
    Chen, Weifu
    Feng, Guocan
    NEUROCOMPUTING, 2012, 77 (01) : 229 - 242
  • [5] A SUPERVISORY APPROACH TO SEMI-SUPERVISED CLUSTERING
    Conroy, Bryan
    Xi, Yongxin Taylor
    Ramadge, Peter
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 1858 - 1861
  • [6] Clustering and semi-supervised classification for clickstream data via mixture models
    Gallaugher, Michael P. B.
    Mcnicholas, Paul D.
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2024, 52 (03): : 678 - 695
  • [7] Text Classification Using Semi-Supervised Clustering
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 197 - 200
  • [8] Semi-supervised Classification Based on Clustering Ensembles
    Chen, Si
    Guo, Gongde
    Chen, Lifei
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PROCEEDINGS, 2009, 5855 : 629 - 638
  • [9] Improving Semi-Supervised Classification using Clustering
    Arora, J.
    Tushir, M.
    Kashyap, R.
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2020, 7 (25) : 1 - 9
  • [10] Semi-supervised hyperspectral classification from a small number of training samples using a co-training approach
    Romaszewski, Michal
    Glomb, Przemyslaw
    Cholewa, Michal
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2016, 121 : 60 - 76