Model-based clustering for social networks

被引:415
|
作者
Handcock, Mark S.
Raftery, Adrian E.
Tantrum, Jeremy M.
机构
[1] Univ Washington, Ctr Stat & Social Sci, Seattle, WA 98195 USA
[2] Microsoft adCtr Labs, Redmond, WA USA
关键词
Bayes factor; dyad; latent space; Markov chain Monte Carlo methods; mixture model; transitivity;
D O I
10.1111/j.1467-985X.2007.00471.x
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Network models are widely used to represent relations between interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the latent position cluster model, under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean 'social space', and the actors' locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster. We propose two estimation methods: a two-stage maximum likelihood method and a fully Bayesian method that uses Markov chain Monte Carlo sampling. The former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters that are present by using approximate conditional Bayes factors. Our model represents transitivity, homophily by attributes and clustering simultaneously and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, which are potentially useful as inputs to models of more complex systems of which the network is part, such as epidemic models of infectious disease. We apply the model to two networks of social relations. A free software package in the R statistical language, latentnet, is available to analyse data by using the model.
引用
收藏
页码:301 / 322
页数:22
相关论文
共 50 条
  • [21] STRATEGIES FOR ONLINE INFERENCE OF MODEL-BASED CLUSTERING IN LARGE AND GROWING NETWORKS
    Zanghi, Hugo
    Picard, Franck
    Miele, Vincent
    Ambroise, Christophe
    ANNALS OF APPLIED STATISTICS, 2010, 4 (02): : 687 - 714
  • [22] Model-Based Clustering of Social Vulnerability to Urban Extreme Heat Events
    Tuccillo, Joseph V.
    Buttenfield, Barbara P.
    GEOGRAPHIC INFORMATION SCIENCE, (GISCIENCE 2016), 2016, 9927 : 114 - 129
  • [23] Model-based learning of information diffusion in social media networks
    Zhecheng Qiang
    Eduardo L. Pasiliao
    Qipeng P. Zheng
    Applied Network Science, 4
  • [24] Model-based learning of information diffusion in social media networks
    Qiang, Zhecheng
    Pasiliao, Eduardo L.
    Zheng, Qipeng P.
    APPLIED NETWORK SCIENCE, 2019, 4 (01)
  • [25] On finite mixture modeling and model-based clustering of directed weighted multilayer networks
    Melnykov, Volodymyr
    Sarkar, Shuchismita
    Melnykov, Yana
    PATTERN RECOGNITION, 2021, 112
  • [26] Model-Based Clustering of Nonparametric Weighted Networks With Application to Water Pollution Analysis
    Agarwal, Anna
    Xue, Lingzhou
    TECHNOMETRICS, 2020, 62 (02) : 161 - 172
  • [27] Probability of misclassification in model-based clustering
    Xuwen Zhu
    Computational Statistics, 2019, 34 : 1427 - 1442
  • [28] Model-based clustering for random hypergraphs
    Tin Lok James Ng
    Thomas Brendan Murphy
    Advances in Data Analysis and Classification, 2022, 16 : 691 - 723
  • [29] Model-based clustering of longitudinal data
    McNicholas, Paul D.
    Murphy, T. Brendan
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2010, 38 (01): : 153 - 168
  • [30] Boosting for model-based data clustering
    Saffari, Amir
    Bischof, Horst
    PATTERN RECOGNITION, 2008, 5096 : 51 - 60