Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data

被引:0
|
作者
Dinari, Or [1 ]
Freifeld, Oren [1 ]
机构
[1] Ben Gurion Univ Negev, Beer Sheva, Israel
基金
以色列科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Practical tools for clustering streaming data must be fast enough to handle the arrival rate of the observations. Typically, they also must adapt on the fly to possible lack of stationarity; i.e., the data statistics may be time-dependent due to various forms of drifts, changes in the number of clusters, etc. The Dirichlet Process Mixture Model (DPMM), whose Bayesian nonparametric nature allows it to adapt its complexity to the data, seems a natural choice for the streaming-data case. In its classical formulation, however, the DPMM cannot capture common types of drifts in the data statistics. Moreover, and regardless of that limitation, existing methods for online DPMM inference are too slow to handle rapid data streams. In this work we propose adapting both the DPMM and a known DPMM sampling-based non-streaming inference method for streaming-data clustering. We demonstrate the utility of the proposed method on several challenging settings, where it obtains state-of-the-art results while being on par with other methods in terms of speed.
引用
收藏
页码:818 / 835
页数:18
相关论文
共 50 条
  • [1] Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering
    Li C.
    Rana S.
    Phung D.
    Venkatesh S.
    Annals of Data Science, 2016, 3 (2) : 205 - 223
  • [2] DIRICHLET PROCESS MIXTURE MODELS FOR CLUSTERING I-VECTOR DATA
    Seshadri, Shreyas
    Remes, Ulpu
    Rasanen, Okko
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5470 - 5474
  • [3] High Dimensional Data Clustering by means of Distributed Dirichlet Process Mixture Models
    Meguelati, Khadidja
    Fontez, Benedicte
    Hilgert, Nadine
    Masseglia, Florent
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 890 - 899
  • [4] Bayesian curve fitting and clustering with Dirichlet process mixture models for microarray data
    Ju-Hyun Park
    Minjung Kyung
    Journal of the Korean Statistical Society, 2019, 48 : 207 - 220
  • [5] Bayesian curve fitting and clustering with Dirichlet process mixture models for microarray data
    Park, Ju-Hyun
    Kyung, Minjung
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2019, 48 (02) : 207 - 220
  • [6] Clustering and unconstrained ordination with Dirichlet process mixture models
    Stratton, Christian
    Hoegh, Andrew
    Rodhouse, Thomas J.
    Green, Jennifer L.
    Banner, Katharine M.
    Irvine, Kathryn M.
    METHODS IN ECOLOGY AND EVOLUTION, 2024, 15 (09): : 1720 - 1732
  • [7] Axially Symmetric Data Clustering Through Dirichlet Process Mixture Models of Watson Distributions
    Fan, Wentao
    Bouguila, Nizar
    Du, Ji-Xiang
    Liu, Xin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (06) : 1683 - 1694
  • [8] DIRICHLET PROCESS MIXTURE MODELS FOR TIME-DEPENDENT CLUSTERING
    Yu, Kezi
    Djuric, Petar M.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4383 - 4387
  • [9] Variable selection in clustering via Dirichlet process mixture models
    Kim, Sinae
    Tadesse, Mahlet G.
    Vannucci, Marina
    BIOMETRIKA, 2006, 93 (04) : 877 - 893
  • [10] Dirichlet process mixture models for insurance loss data
    Hong, Liang
    Martin, Ryan
    SCANDINAVIAN ACTUARIAL JOURNAL, 2018, (06) : 545 - 554