Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding

被引:5
|
作者
Fang, Xian [1 ]
Tie, Zhixin [1 ]
Guan, Yinan [1 ]
Rao, Shanshan [1 ]
机构
[1] Zhejiang Sci Tech Univ, Sch Informat Sci & Technol, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Data clustering; Quasi-cluster centers clustering; Potential entropy; Optimal parameter; t-distributed stochastic neighbor embedding; DENSITY PEAKS; FAST SEARCH; FIND; REDUCTION; ROCK;
D O I
10.1007/s00500-018-3221-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel density-based clustering algorithm named QCC is presented recently. Although the algorithm has proved its strong robustness, it is still necessary to manually determine the two input parameters, including the number of neighbors (k) and the similarity threshold value (), which severely limits the promotion of the algorithm. In addition, the QCC does not perform excellently when confronting the datasets with relatively high dimensions. To overcome these defects, firstly, we define a new method for computing local density and introduce the strategy of potential entropy into the original algorithm. Based on this idea, we propose a new QCC clustering algorithm (QCC-PE). QCC-PE can automatically extract optimal value of the parameter k by optimizing potential entropy of data field. By this means, the optimized parameter can be calculated from the datasets objectively rather than the empirical estimation accumulated from a large number of experiments. Then, t-distributed stochastic neighbor embedding (tSNE) is applied to the model of QCC-PE and further brings forward a method based on tSNE (QCC-PE-tSNE), which preprocesses high-dimensional datasets by dimensionality reduction technique. We compare the performance of the proposed algorithms with QCC, DBSCAN, and DP in the synthetic datasets, Olivetti Face Database, and real-world datasets respectively. Experimental results show that our algorithms are feasible and effective and can often outperform the comparisons.
引用
收藏
页码:5645 / 5657
页数:13
相关论文
共 50 条
  • [31] Using t-distributed Stochastic Neighbor Embedding (t-SNE) for cluster analysis and spatial zone delineation of groundwater geochemistry data
    Liu, Honghua
    Yang, Jing
    Ye, Ming
    James, Scott C.
    Tang, Zhonghua
    Dong, Jie
    Xing, Tongju
    JOURNAL OF HYDROLOGY, 2021, 597 (597)
  • [32] Combining t-Distributed Stochastic Neighbor Embedding With Convolutional Neural Networks for Hyperspectral Image Classification
    Gao, Lianru
    Gu, Daixin
    Zhuang, Lina
    Ren, Jinchang
    Yang, Dong
    Zhang, Bing
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (08) : 1368 - 1372
  • [33] Dimensionality Reduction of Diabetes Mellitus Patient Data Using the T-Distributed Stochastic Neighbor Embedding
    Meniailov, Ievgen
    Krivtsov, Serhii
    Chumachenko, Tetyana
    SMART TECHNOLOGIES IN URBAN ENGINEERING, STUE-2022, 2023, 536 : 86 - 95
  • [34] A Deep Learning Approach for Process Data Visualization Using t-Distributed Stochastic Neighbor Embedding
    Zhu, Wenbo
    Webb, Zachary T.
    Mao, Kaitian
    Romagnoli, Jose
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2019, 58 (22) : 9564 - 9575
  • [35] Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE) of Molecular Simulation Trajectories
    Spiwok, Vojtech
    Kriz, Pavel
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2020, 7
  • [36] t-Distributed Stochastic Neighbor Embedding (t-SNE): A tool for eco-physiological transcriptomic analysis
    Cieslak, Matthew C.
    Castelfranco, Ann M.
    Roncalli, Vittoria
    Lenz, Petra H.
    Hartline, Daniel K.
    MARINE GENOMICS, 2020, 51
  • [37] Chemometric Classification of Crude Oils in Complex Petroleum Systems Using t-Distributed Stochastic Neighbor Embedding Machine Learning Algorithm
    Tao, Keyu
    Cao, Jian
    Wang, Yuce
    Mi, Julei
    Ma, Wanyun
    Shi, Chunhua
    ENERGY & FUELS, 2020, 34 (05) : 5884 - 5899
  • [38] Revealing Geochemical Patterns Associated with Mineralization Using t-Distributed Stochastic Neighbor Embedding and Random Forest
    Zixian Shi
    Renguang Zuo
    Yihui Xiong
    Siquan Sun
    Bao Zhou
    Mathematical Geosciences, 2023, 55 : 321 - 344
  • [39] On the Use of t-Distributed Stochastic Neighbor Embedding for Data Visualization and Classification of Individuals with Parkinson's Disease
    Oliveira, Fabio Henrique M.
    Machado, Alessandro R. P.
    Andrade, Adriano O.
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2018, 2018
  • [40] Visualization of vibrational spectroscopy for agro-food samples using t-Distributed Stochastic Neighbor Embedding
    Luo, Na
    Yang, Xinting
    Sun, Chuanheng
    Xing, Bin
    Han, Jiawei
    Zhao, Chunjiang
    FOOD CONTROL, 2021, 126