A Trie Based Set Similarity Query Algorithm

被引:1
|
作者
Jia, Lianyin [1 ,2 ]
Tang, Junzhuo [1 ]
Li, Mengjuan [3 ]
Li, Runxin [1 ]
Ding, Jiaman [1 ]
Chen, Yinong [4 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
[3] Yunnan Normal Univ, Lib, Kunming 650500, Peoples R China
[4] Arizona State Univ, Sch Comp & Augmented Intelligence, Tempe, AZ 85287 USA
基金
中国国家自然科学基金;
关键词
set similarity query; T-starTrie; FMNodes; TT-SSQ; EFFICIENT;
D O I
10.3390/math11010229
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Set similarity query is a primitive for many applications, such as data integration, data cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by one and do not have sufficient support for duplicated sets, thus leading to low efficiency. To solve this problem, this paper designs T-starTrie, an efficient trie based index for set similarity query, which can naturally group sets with the same prefix into one node, and can filter all sets corresponding to the node at a time, thereby significantly improving the candidates generation efficiency. In this paper, we find that the set similarity query problem can be transformed into matching nodes of the first-layer (FMNodes) detecting problem on T-starTrie. Therefore, an efficient FLMNode detection algorithm is designed. Based on this, an efficient set similarity query algorithm, TT-SSQ, is implemented by developing a variety of filtering techniques. Experimental results show that TT-SSQ can be up to 3.10x faster than existing algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] A Set Intersection Algorithm Via x-Fast Trie
    Ye, Bangyu
    JOURNAL OF COMPUTERS, 2016, 11 (02) : 91 - 98
  • [12] Fast feature selection algorithm for neighborhood rough set model based on Bucket and Trie structures
    Benouini, Rachid
    Batioua, Imad
    Ezghari, Soufiane
    Zenkouar, Khalid
    Zahi, Azeddine
    GRANULAR COMPUTING, 2020, 5 (03) : 329 - 347
  • [13] Fast feature selection algorithm for neighborhood rough set model based on Bucket and Trie structures
    Rachid Benouini
    Imad Batioua
    Soufiane Ezghari
    Khalid Zenkouar
    Azeddine Zahi
    Granular Computing, 2020, 5 : 329 - 347
  • [14] An Algorithm for Case-Based Reasoning Based on Similarity Rough Set
    Ji, Sai
    Yuan, Shen-fang
    Wang, Shui-ping
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 5, PROCEEDINGS, 2008, : 226 - +
  • [15] fgssjoin: A GPU-based Algorithm for Set Similarity Joins
    Quirino, Rafael D.
    Junior, Sidney R.
    Ribeiro, Leonardo A.
    Martins, Wellington S.
    ICEIS: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2017, : 152 - 161
  • [16] An Improved DBSCAN Algorithm Based on the Neighbor Similarity and Fast Nearest Neighbor Query
    Li, Shan-Shan
    IEEE ACCESS, 2020, 8 : 47468 - 47476
  • [17] Query by Voice Example and sound similarity based on the Dynamic Time Warping algorithm
    Niewiadomy, Dominik
    Pelikant, Adam
    PRZEGLAD ELEKTROTECHNICZNY, 2010, 86 (08): : 143 - 146
  • [18] An Improved HITS Algorithm Based on Page-query Similarity and Page Popularity
    Liu, Xinyue
    Lin, Hongfei
    Zhang, Cong
    JOURNAL OF COMPUTERS, 2012, 7 (01) : 130 - 134
  • [19] An Algorithm for URL Routing Based on Trie Structure
    Zhang, Yijun
    Xu, Lizhen
    2015 12TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2015, : 157 - 160
  • [20] Query similarity computing based on system similarity measurement
    Zhang, Chengzhi
    Xu, Xiaoqin
    Su, Xinning
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 42 - +