A Framework for Feature Selection in Clustering

被引:422
|
作者
Witten, Daniela M. [1 ]
Tibshirani, Robert [1 ,2 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Hierarchical clustering; High-dimensional; K-means clustering; Lasso; Model selection; Sparsity; Unsupervised learning; VARIABLE SELECTION; PRINCIPAL-COMPONENTS; OBJECTS; NUMBER;
D O I
10.1198/jasa.2010.tm09415
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters. These approaches are demonstrated on simulated and genomic data.
引用
收藏
页码:713 / 726
页数:14
相关论文
共 50 条
  • [21] A Clustering based Selection Framework for Cost Aware and Test-time Feature Elicitation
    Das, Srijita
    Iyer, Rishabh
    Natarajan, Sriraam
    CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 20 - 28
  • [22] Feature selection based on partition clustering
    Liu, Shuang
    Zhao, Qiang
    Wu, Xiang
    INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2014, 18 (02) : 135 - 142
  • [23] A filter feature selection method for clustering
    Jouve, PE
    Nicoloyannis, N
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2005, 3488 : 583 - 593
  • [24] A survey on feature selection approaches for clustering
    Emrah Hancer
    Bing Xue
    Mengjie Zhang
    Artificial Intelligence Review, 2020, 53 : 4519 - 4545
  • [25] A Local Feature Selection Approach for Clustering
    Gui, Bing
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISKE 2011), 2011, 122 : 55 - 62
  • [26] Feature selection for clustering - A filter solution
    Dash, M
    Choi, K
    Scheuermann, P
    Liu, H
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 115 - 122
  • [27] Bayesian Feature Selection for Clustering Problems
    Hruschka, Eduardo
    Hruschka, Estevam, Jr.
    Covoes, Thiago
    Ebecken, Nelson
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2006, 5 (04) : 315 - 327
  • [28] Local Feature Selection in Text Clustering
    Ribeiro, Marcelo N.
    Neto, Manoel J. R.
    Prudencio, Ricardo B. C.
    ADVANCES IN NEURO-INFORMATION PROCESSING, PT II, 2009, 5507 : 45 - +
  • [29] Feature Selection for Clustering Online Learners
    Huang, Lei
    Wang, Xinghui
    Wu, Zhouhua
    Wang, Feiyu
    2019 EIGHTH INTERNATIONAL CONFERENCE ON EDUCATIONAL INNOVATION THROUGH TECHNOLOGY (EITT), 2019, : 1 - 6
  • [30] Feature Selection Embedded Subspace Clustering
    Peng, Chong
    Kang, Zhao
    Yang, Ming
    Cheng, Qiang
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (07) : 1018 - 1022