Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity

被引:39
|
作者
Peng, Dehua [1 ,2 ,3 ,4 ]
Gui, Zhipeng [2 ,3 ,4 ]
Wang, Dehe [5 ,6 ]
Ma, Yuncheng [2 ,3 ]
Huang, Zichen [2 ,3 ]
Zhou, Yu [5 ,6 ]
Wu, Huayi [1 ,3 ,4 ]
机构
[1] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China
[3] Wuhan Univ, Collaborat Innovat Ctr Geospatial Technol, Wuhan, Peoples R China
[4] Hubei Luojia Lab, Wuhan, Peoples R China
[5] Wuhan Univ, Coll Life Sci, Modern Virol Res Ctr, State Key Lab Virol, Wuhan, Peoples R China
[6] Wuhan Univ, Frontier Sci Ctr Immunol & Metab, Wuhan, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
TRANSCRIPTOMIC CELL-TYPES; RNA-SEQ; FLOW; IDENTIFICATION; SPACE; ALGORITHM; EFFICIENT; CRITERIA; TOOL;
D O I
10.1038/s41467-022-33136-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Clustering is a powerful machine learning method for discovering similar patterns according to the proximity of elements in feature space. Here the authors propose a local direction centrality clustering algorithm that copes with heterogeneous density and weak connectivity issues. Clustering is a powerful machine learning method for discovering similar patterns according to the proximity of elements in feature space. It is widely used in computer science, bioscience, geoscience, and economics. Although the state-of-the-art partition-based and connectivity-based clustering methods have been developed, weak connectivity and heterogeneous density in data impede their effectiveness. In this work, we propose a boundary-seeking Clustering algorithm using the local Direction Centrality (CDC). It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points, thereby preventing cross-cluster connections and separating weakly-connected clusters. We demonstrate the validity of CDC by detecting complex structured clusters in challenging synthetic datasets, identifying cell types from single-cell RNA sequencing (scRNA-seq) and mass cytometry (CyTOF) data, recognizing speakers on voice corpuses, and testifying on various types of real-world benchmarks.
引用
收藏
页数:14
相关论文
共 24 条
  • [21] An Adaptive Clustering Algorithm Based on Local-Density Peaks for Imbalanced Data Without Parameters
    Tong, Wuning
    Wang, Yuping
    Liu, Delong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3419 - 3432
  • [22] A Source Number Estimation Algorithm Based on Data Local Density and Fuzzy C-Means Clustering
    Wu, Na
    Wang, Ke
    Wan, Liangtian
    Liu, Ning
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [23] Improved Density Peaks Clustering Based on Shared-Neighbors of Local Cores for Manifold Data Sets
    Cheng, Dongdong
    Huang, Jinlong
    Zhang, Sulan
    Liu, Huijun
    IEEE ACCESS, 2019, 7 : 151339 - 151349
  • [24] A Fast Density Peak Clustering Method for Power Data Security Detection Based on Local Outlier Factors
    Lv, Zhuo
    Di, Li
    Chen, Cen
    Zhang, Bo
    Li, Nuannuan
    PROCESSES, 2023, 11 (07)