An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs

被引:44
|
作者
Shao, Yingxia [1 ]
Cui, Bin [1 ]
Chen, Lei [2 ]
Liu, Mingming [1 ]
Xie, Xing [3 ]
机构
[1] Peking Univ, Sch EECS, Key Lab High Confidence Software Technol MOE, Beijing, Peoples R China
[2] HKUST, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China
[3] Microsoft Res, New York, NY USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2015年 / 8卷 / 08期
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
10.14778/2757807.2757809
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis applications. Nowadays, graphs in the real world become much larger and more dynamic. The existing solutions for similarity search are expensive in terms of time and space cost. None of them can efficiently support similarity search over large dynamic graphs. In this paper, we propose a novel two-stage random-walk sampling framework (TSF) for SimRank-based similarity search (e.g., top-k search). the preprocessing stage, TSE samples a set of one-way graphs to index raw random walks in a novel manner within 00111,) time and space, where N is the number of vertices and is the number of one-way graphs. The one-way graph can be efficiently updated in accordance with the graph modification, thus TSF is well suited to dynamic graphs. During the query stage, TSF can search similar vertices fast by naturally pruning unqualified vertices based on the connectivity of one-way graphs. Furthermore, with additional R-q samples, TSF can estimate the SimRank score with probability 1- 2e(-2 epsilon 2 RgRq/(1 - c)2) if the error of approximation is bounded by. Finally, to guarantee the scalability of TSF, the one-way graphs can also be compactly stored on the disk when the memory is limited. Extensive experiments have demonstrated that TSF can handle dynamic billion-edge graphs with high performance.
引用
收藏
页码:838 / 849
页数:12
相关论文
共 50 条
  • [31] Semantic SPARQL Similarity Search Over RDF Knowledge Graphs
    Zheng, Weiguo
    Zou, Lei
    Peng, Wei
    Yan, Xifeng
    Song, Shaoxu
    Zhao, Dongyan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (11): : 840 - 851
  • [32] SimRank*: effective and scalable pairwise similarity search based on graph topology
    Yu, Weiren
    Lin, Xuemin
    Zhang, Wenjie
    Pei, Jian
    McCann, Julie A.
    VLDB JOURNAL, 2019, 28 (03): : 401 - 426
  • [33] Towards Efficient SimRank Computation on Large Networks
    Yu, Weiren
    Lin, Xuemin
    Zhang, Wenjie
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 601 - 612
  • [34] SimRank*: effective and scalable pairwise similarity search based on graph topology
    Weiren Yu
    Xuemin Lin
    Wenjie Zhang
    Jian Pei
    Julie A. McCann
    The VLDB Journal, 2019, 28 : 401 - 426
  • [35] Efficient similarity search by summarization in large video database
    Zhou, Xiangmin
    Zhou, Xiaofang
    Shen, Heng Tao
    Conferences in Research and Practice in Information Technology Series, 2007, 63 : 161 - 167
  • [36] An Efficient Similarity Search in Large Data Collections with MapReduce
    Trong Nhan Phan
    Kueng, Josef
    Tran Khanh Dang
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2014, 2014, 8860 : 44 - 57
  • [37] Efficient Similarity Search in Very Large String Sets
    Fenz, Dandy
    Lange, Dustin
    Rheinlaender, Astrid
    Naumann, Felix
    Leser, Ulf
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2012, 2012, 7338 : 262 - 279
  • [38] Efficient similarity search for hierarchical data in large databases
    Kailing, K
    Kriegel, HP
    Schönauer, S
    Seidl, T
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2004, PROCEEDINGS, 2004, 2992 : 676 - 693
  • [39] Exact Single-Source SimRank Computation on Large Graphs
    Wang, Hanzhi
    Wei, Zhewei
    Yuan, Ye
    Du, Xiaoyong
    Wen, Ji-Rong
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 653 - 663
  • [40] Scalable Single-source SimRank Computation for Large Graphs
    Gao, Xingkun
    Bao, Nianyuan
    Liu, Jie
    Tang, Jie
    Wu, Gangshan
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 1083 - 1091