An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs

被引:44
|
作者
Shao, Yingxia [1 ]
Cui, Bin [1 ]
Chen, Lei [2 ]
Liu, Mingming [1 ]
Xie, Xing [3 ]
机构
[1] Peking Univ, Sch EECS, Key Lab High Confidence Software Technol MOE, Beijing, Peoples R China
[2] HKUST, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China
[3] Microsoft Res, New York, NY USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2015年 / 8卷 / 08期
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
10.14778/2757807.2757809
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis applications. Nowadays, graphs in the real world become much larger and more dynamic. The existing solutions for similarity search are expensive in terms of time and space cost. None of them can efficiently support similarity search over large dynamic graphs. In this paper, we propose a novel two-stage random-walk sampling framework (TSF) for SimRank-based similarity search (e.g., top-k search). the preprocessing stage, TSE samples a set of one-way graphs to index raw random walks in a novel manner within 00111,) time and space, where N is the number of vertices and is the number of one-way graphs. The one-way graph can be efficiently updated in accordance with the graph modification, thus TSF is well suited to dynamic graphs. During the query stage, TSF can search similar vertices fast by naturally pruning unqualified vertices based on the connectivity of one-way graphs. Furthermore, with additional R-q samples, TSF can estimate the SimRank score with probability 1- 2e(-2 epsilon 2 RgRq/(1 - c)2) if the error of approximation is bounded by. Finally, to guarantee the scalability of TSF, the one-way graphs can also be compactly stored on the disk when the memory is limited. Extensive experiments have demonstrated that TSF can handle dynamic billion-edge graphs with high performance.
引用
收藏
页码:838 / 849
页数:12
相关论文
共 50 条
  • [41] Efficient Community Search over Large Directed Graphs: An Augmented Index-based Approach
    Chen, Yankai
    Zhan, Jie
    Fang, Yixiang
    Cao, Xin
    King, Irwin
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3544 - 3550
  • [42] Efficient Top-k s-Biplexes Search over Large Bipartite Graphs
    Xu, Zhenxiang
    Liu, Yiping
    Zhou, Yi
    Hao, Yimin
    Wang, Zhengren
    arXiv,
  • [43] Efficient Core Maintenance in Large Dynamic Graphs
    Li, Rong-Hua
    Yu, Jeffrey Xu
    Mao, Rui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (10) : 2453 - 2465
  • [44] Efficient Influential Community Search in Large Uncertain Graphs
    Luo, Wensheng
    Zhou, Xu
    Li, Kenli
    Gao, Yunjun
    Li, Keqin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3779 - 3793
  • [45] Efficient Top-K SimRank-based Similarity Join
    Tao, Wenbo
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1603 - 1604
  • [46] Effective Community Search over Large Spatial Graphs
    Fang, Yixiang
    Cheng, Reynold
    Li, Xiaodong
    Luo, Siqiang
    Hu, Jiafeng
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (06): : 709 - 720
  • [47] Reliable Community Search over Dynamic Bipartite Graphs
    Li, Mo
    Xie, Zhiran
    Dine, Linlin
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 298 - 307
  • [48] Efficient Top-K SimRank-based Similarity Join
    Tao, Wenbo
    Yu, Minghe
    Li, Guoliang
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (03): : 317 - 328
  • [49] A FRAMEWORK FOR LARGE SCALE SEMANTIC SIMILARITY SEARCH ON SATELLITE IMAGERY
    Ramasubramanian, Muthukumaran
    Gurung, Iksha
    Thomas, Leo
    Berger, Kathryn
    Ranjan, Soumya
    Mok, Heidi
    Subramanian, Sowmya
    George, Vitor
    Maskey, Manil
    Ramachandran, Rahul
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 1404 - 1407
  • [50] Taming Computational Complexity: Efficient and Parallel SimRank Optimizations on Undirected Graphs
    Yu, Weiren
    Lin, Xuemin
    Le, Jiajin
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2010, 6184 : 280 - +