Hypothesis testing for automated community detection in networks

被引:112
|
作者
Bickel, Peter J. [1 ]
Sarkar, Purnamrita [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Asymptotic analysis; Community detection; Hypothesis testing; Networks; Stochastic block model; Tracy-Widom distribution; STOCHASTIC BLOCKMODELS; UNIVERSALITY; EIGENVALUES; MODEL; EDGE;
D O I
10.1111/rssb.12117
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Community detection in networks is a key exploratory tool with applications in a diverse set of areas, ranging from finding communities in social and biological networks to identifying link farms in the World Wide Web. The problem of finding communities or clusters in a network has received much attention from statistics, physics and computer science. However, most clustering algorithms assume knowledge of the number of clusters k. We propose to determine k automatically in a graph generated from a stochastic block model by using a hypothesis test of independent interest. Our main contribution is twofold; first, we theoretically establish the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix and use that distribution for our test of the hypothesis that a random graph is of Erdos-Renyi (noise) type. Secondly, we use this test to design a recursive bipartitioning algorithm, which naturally uncovers nested community structure. Using simulations and quantifiable classification tasks on real world networks with ground truth, we show that our algorithm outperforms state of the art methods.
引用
收藏
页码:253 / 273
页数:21
相关论文
共 50 条
  • [21] Hypothesis Testing in Feedforward Networks With Broadcast Failures
    Zhang, Zhenliang
    Chong, Edwin K. P.
    Pezeshki, Ali
    Moran, William
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2013, 7 (05) : 797 - 810
  • [22] Testing the random walk hypothesis with neural networks
    Zapranis, Achilleas
    ARTIFICIAL NEURAL NETWORKS - ICANN 2006, PT 2, 2006, 4132 : 664 - 671
  • [23] Hypothesis Testing for Group Structure in Legislative Networks
    Kirkland, Justin H.
    STATE POLITICS & POLICY QUARTERLY, 2013, 13 (02) : 225 - 243
  • [24] Distributed bayesian hypothesis testing in sensor networks
    Alanyali, M
    Venkatesh, S
    Savas, P
    Aeron, S
    PROCEEDINGS OF THE 2004 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2004, : 5369 - 5374
  • [25] A generalized hypothesis test for community structure in networks
    Yanchenko, Eric
    Sengupta, Srijan
    NETWORK SCIENCE, 2024, 12 (02) : 122 - 138
  • [26] COMMUNITY DETECTION IN NETWORKS
    Dorso, C. O.
    Medus, A. D.
    INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2010, 20 (02): : 361 - 367
  • [27] Hypothesis testing for landmine detection with EMI images
    Collins, L
    Gao, P
    1998 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AT THE IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE - PROCEEDINGS, VOL 1-2, 1998, : 237 - 240
  • [28] Sequential Analysis: Hypothesis Testing and Changepoint Detection
    Gillard, Jonathan
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2015, 178 (03) : 785 - 785
  • [29] Testing the niche variation hypothesis in a community of passerine birds
    Maldonado, Karin
    Bozinovic, Francisco
    Newsome, Seth D.
    Sabat, Pablo
    ECOLOGY, 2017, 98 (04) : 903 - 908
  • [30] Glucose Tolerance: Hypothesis Testing on Malaysian Diabetic Community
    Gillani, Syed Wasif
    Sari, Yelly Oktavia
    Sulaiman, Syed Azhar Syed
    Baig, Mirza R.
    CURRENT DIABETES REVIEWS, 2014, 10 (05) : 311 - 326