Top-k similarity search in heterogeneous information networks with x-star network schema

被引:34
|
作者
Zhang, Mingxi [1 ,2 ]
Hu, Hao [2 ]
He, Zhenying [2 ]
Wang, Wei [2 ]
机构
[1] Univ Shanghai Sci & Technol, Coll Commun & Art Design, Shanghai 200093, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai 201203, Peoples R China
基金
美国国家科学基金会;
关键词
Similarity search; Information network; x-star network schema;
D O I
10.1016/j.eswa.2014.08.039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An x-star network is an information network which consists of centers with connections among themselves, and different type attributes linking to these centers. As x-star networks become ubiquitous, extracting knowledge from x-star networks has become an important task. Similarity search in x-star network aims to find the centers similar to a given query center, which has numerous applications including collaborative filtering, community mining and web search. Although existing methods yield promising similar results, such as SimRank and P-Rank, they are not applicable for massive x-star networks. In this paper, we propose a structural-based similarity measure, NetSim, towards efficiently computing similarity between centers in an x-star network. The similarity between attributes is computed in the pre-processing stage by the expected meeting probability over attribute network that is extracted from the whole structure of x-star network. The similarity between centers is computed online according to the attribute similarities based on the intuition that similar centers are linked with similar attributes. NetSim requires less time and space cost than existing methods since the scale of attribute network is significantly smaller than the whole x-star network. For supporting fast online query processing, we develop a pruning algorithm by building a pruning index, which prunes candidate centers that are not promising. Extensive experiments demonstrate the effectiveness and efficiency of our method through comparing with the state-of-the-art measures. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:699 / 712
页数:14
相关论文
共 50 条
  • [31] Top-K Interesting Subgraph Discovery in Information Networks
    Gupta, Manish
    Gao, Jing
    Yan, Xifeng
    Cam, Hasan
    Han, Jiawei
    2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 820 - 831
  • [32] Exact Top-k Nearest Keyword Search in Large Networks
    Jiang, Minhao
    Fu, Ada Wai-Chee
    Wong, Raymond Chi-Wing
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 393 - 404
  • [33] Scalable Top-k Spatial Image Search on Road Networks
    Zhao, Pengpeng
    Kuang, Xiaopeng
    Sheng, Victor S.
    Xu, Jiajie
    Wu, Jian
    Cui, Zhiming
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2015, PT II, 2015, 9050 : 379 - 396
  • [34] Top-k Nearest Keyword Search in Public Transportation Networks
    Huang, Wuwei
    Dai, Genan
    Ge, Youming
    Liu, Yubao
    2019 15TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2019), 2019, : 67 - 74
  • [35] Efficient top-k similarity document search utilizing distributed file systems and cosine similarity
    Mahmoud Alewiwi
    Cengiz Orencik
    Erkay Savaş
    Cluster Computing, 2016, 19 : 109 - 126
  • [36] Efficient top-k similarity document search utilizing distributed file systems and cosine similarity
    Alewiwi, Mahmoud
    Orencik, Cengiz
    Savas, Erkay
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2016, 19 (01): : 109 - 126
  • [37] Bidirectional String Anchors for Improved Text Indexing and Top-K Similarity Search
    Loukides, Grigorios
    Pissis, Solon P.
    Sweering, Michelle
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11093 - 11111
  • [38] Top-k query evaluation for schema-based peer-to-peer networks
    Nejdl, W
    Siberski, W
    Thaden, U
    Balke, WT
    SEMANTIC WEB - ISWC 2004, PROCEEDINGS, 2004, 3298 : 137 - 151
  • [39] Progressive Top-K Nearest Neighbors Search in Large Road Networks
    Ouyang, Dian
    Wen, Dong
    Qin, Lu
    Chang, Lijun
    Zhang, Ying
    Lin, Xuemin
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 1781 - 1795
  • [40] Why-not questions about spatial temporal top-k trajectory similarity search
    Luo, Changyin
    Dan, Tangpeng
    Li, Yanhong
    Meng, Xiaofeng
    Li, Guohui
    KNOWLEDGE-BASED SYSTEMS, 2021, 231