STRATEGIES FOR ONLINE INFERENCE OF MODEL-BASED CLUSTERING IN LARGE AND GROWING NETWORKS

被引:19
|
作者
Zanghi, Hugo [1 ]
Picard, Franck [2 ]
Miele, Vincent [2 ]
Ambroise, Christophe [3 ]
机构
[1] Exalead, F-75008 Paris, France
[2] UCB Lyon 1, Lab Biometrie & Biol Evolut, F-69622 Villeurbanne, France
[3] CNRS, INRA, Lab Stat & Genome, UEVE 1152,UMR 8071, F-91000 Evry, France
来源
ANNALS OF APPLIED STATISTICS | 2010年 / 4卷 / 02期
关键词
Graph clustering; EM Algorithms; online strategies; web graph structure analysis; MIXED MEMBERSHIP; EM ALGORITHM; MIXTURE; CONVERGENCE; PREDICTION;
D O I
10.1214/10-AOAS359
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper we adapt online estimation strategies to perform model-based clustering on large networks. Our work focuses on two algorithms, the first based on the SAEM algorithm, and the second on variational methods. These two strategies are compared with existing approaches on simulated and real data. We use the method to decipher the connexion structure of the political websphere during the US political campaign in 2008. We show that our online EM-based algorithms offer a good trade-off between precision and speed, when estimating parameters for mixture distributions in the context of random graphs.
引用
收藏
页码:687 / 714
页数:28
相关论文
共 50 条
  • [31] Model-Based Edge Clustering
    Sewell, Daniel K.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2021, 30 (02) : 390 - 405
  • [32] Model-Based Clustering with HDBSCAN
    Strobl, Michael
    Sander, Joerg
    Campello, Ricardo J. G. B.
    Zaiane, Osmar
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT II, 2021, 12458 : 364 - 379
  • [33] A model-based distance for clustering
    Rattray, M
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL IV, 2000, : 13 - 16
  • [34] Parametric model-based clustering
    Nikulin, V
    Smola, AJ
    DATA MINING, INTRUSION DETECTION, INFORMATION ASSURANCE, AND DATA NETWORKS SECURITY 2005, 2005, 5812 : 190 - 201
  • [35] Model-based subspace clustering
    Hoff, Peter D.
    BAYESIAN ANALYSIS, 2006, 1 (02): : 321 - 344
  • [36] REMOLD: An Efficient Model-based Clustering Algorithm For Large Datasets with Spark
    Liang, Mingfei
    Li, Qingyong
    Geng, Yangli-ao
    Wang, Jianzhu
    Wei, Zhi
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 376 - 383
  • [37] Model-Based Clustering for Image Segmentation and Large Datasets via Sampling
    Ron Wehrens
    Lutgarde M.C. Buydens
    Chris Fraley
    Adrian E. Raftery
    Journal of Classification, 2004, 21 : 231 - 253
  • [38] Hierarchical model-based clustering of large datasets through fractionation and refractionation
    Tantrum, J
    Murua, A
    Stuetzle, W
    INFORMATION SYSTEMS, 2004, 29 (04) : 315 - 326
  • [39] Model-based clustering for image segmentation and large datasets via sampling
    Wehrens, R
    Buydens, LMC
    Fraley, C
    Raftery, AE
    JOURNAL OF CLASSIFICATION, 2004, 21 (02) : 231 - 253
  • [40] Model-Based Verification Strategies Using SysML and Bayesian Networks
    Gregory, Joe
    Salado, Alejandro
    PROCEEDINGS OF THE 2023 CONFERENCE ON SYSTEMS ENGINEERING RESEARCH, CSER 2023, 2024, : 19 - 33