Towards Efficient SimRank Computation on Large Networks

被引:0
|
作者
Yu, Weiren [1 ]
Lin, Xuemin [1 ]
Zhang, Wenjie [1 ]
机构
[1] Univ New S Wales, Sydney, NSW, Australia
关键词
ALGORITHMS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
SimRank has been a powerful model for assessing the similarity of pairs of vertices in a graph. It is based on the concept that two vertices are similar if they are referenced by similar vertices. Due to its self-referentiality, fast SimRank computation on large graphs poses significant challenges. The state-of-the-art work [17] exploits partial sums memorization for computing SimRank in O(Kmn) time on a graph with n vertices and m edges, where K is the number of iterations. Partial sums memorizing can reduce repeated calculations by caching part of similarity summations for later reuse. However, we observe that computations among different partial sums may have duplicate redundancy. Besides, for a desired accuracy epsilon, the existing SimRank model requires K = inverted left perepndicularlog(C) epsilon inverted right perpendicular iterations [17], where C is a damping factor. Nevertheless, such a geometric rate of convergence is slow in practice if a high accuracy is desirable. In this paper, we address these gaps. (1) We propose an adaptive clustering strategy to eliminate partial sums redundancy (i.e., duplicate computations occurring in partial sums), and devise an efficient algorithm for speeding up the computation of SimRank to O(Kd'n(2)) time, where d' is typically much smaller than the average in-degree of a graph. (2) We also present a new notion of SimRank that is based on a differential equation and can be represented as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) Using real and synthetic data, we empirically verify that our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude, and that our revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores.
引用
收藏
页码:601 / 612
页数:12
相关论文
共 50 条
  • [41] Efficient multiparty computation for comparator networks
    Chida, Koji
    Kikuchi, Hiroaki
    Morohashi, Gembu
    Hirota, Keiichi
    ARES 2007: SECOND INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, PROCEEDINGS, 2007, : 1183 - +
  • [42] Community discovery in large-scale complex networks using distributed SimRank nonnegative matrix factorization
    He, Chaobo
    Fei, Xiang
    Li, Hanchao
    Liu, Hai
    Tang, Yong
    Chen, Qimai
    2017 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2017, : 226 - 231
  • [43] Towards Efficient MaxBRNN Computation for Streaming Updates
    Ning, Wentao
    Yan, Xiao
    Tang, Bo
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2297 - 2302
  • [44] Efficient index-free SimRank similarity search in large graphs by discounting path lengths
    Zhang, Mingxi
    Yang, Liuqian
    Hu, Hangfei
    Liu, Tianxing
    Wang, Jinhua
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 206
  • [45] A Parallel Method for All-Pair SimRank Similarity Computation
    Huang, Xuan
    Gao, Xingkun
    Tang, Jie
    Wu, Gangshan
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT I, 2018, 11334 : 593 - 607
  • [46] SimFusion plus : Extending SimFusion Towards Efficient Estimation on Large and Dynamic Networks
    Yu, Weiren
    Lin, Xuemin
    Zhang, Wenjie
    Zhang, Ying
    Le, Jiajin
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 365 - 374
  • [47] Towards energy-efficient storage placement in large scale sensor networks
    Xie, Lei
    Lu, Sanglu
    Cao, Yingchun
    Chen, Daoxu
    FRONTIERS OF COMPUTER SCIENCE, 2014, 8 (03) : 409 - 425
  • [48] Towards energy-efficient storage placement in large scale sensor networks
    Lei Xie
    Sanglu Lu
    Yingchun Cao
    Daoxu Chen
    Frontiers of Computer Science, 2014, 8 : 409 - 425
  • [49] Efficient SimRank-Based Similarity Join
    Zheng, Weiguo
    Zou, Lei
    Chen, Lei
    Zhao, Dongyan
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2017, 42 (03):
  • [50] Efficient computation of steady states in large-scale ODE models of biochemical reaction networks
    Lines, Glenn Terje
    Paszkowski, Lukasz
    Schmiester, Leonard
    Weindl, Daniel
    Stapor, Paul
    Hasenauer, Jan
    IFAC PAPERSONLINE, 2019, 52 (26): : 32 - 37