A Trie Based Set Similarity Query Algorithm

被引:1
|
作者
Jia, Lianyin [1 ,2 ]
Tang, Junzhuo [1 ]
Li, Mengjuan [3 ]
Li, Runxin [1 ]
Ding, Jiaman [1 ]
Chen, Yinong [4 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
[3] Yunnan Normal Univ, Lib, Kunming 650500, Peoples R China
[4] Arizona State Univ, Sch Comp & Augmented Intelligence, Tempe, AZ 85287 USA
基金
中国国家自然科学基金;
关键词
set similarity query; T-starTrie; FMNodes; TT-SSQ; EFFICIENT;
D O I
10.3390/math11010229
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Set similarity query is a primitive for many applications, such as data integration, data cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by one and do not have sufficient support for duplicated sets, thus leading to low efficiency. To solve this problem, this paper designs T-starTrie, an efficient trie based index for set similarity query, which can naturally group sets with the same prefix into one node, and can filter all sets corresponding to the node at a time, thereby significantly improving the candidates generation efficiency. In this paper, we find that the set similarity query problem can be transformed into matching nodes of the first-layer (FMNodes) detecting problem on T-starTrie. Therefore, an efficient FLMNode detection algorithm is designed. Based on this, an efficient set similarity query algorithm, TT-SSQ, is implemented by developing a variety of filtering techniques. Experimental results show that TT-SSQ can be up to 3.10x faster than existing algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] An efficient length-segmented inverted index-based set similarity query algorithm
    Li, Mengjuan
    Jia, Lianyin
    Hu, Juntao
    Zhang, Ruiqi
    Wei, Shoulin
    Pan, Mengni
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2022, 16 (01) : 85 - 95
  • [2] A trie compaction algorithm for a large set of keys
    Aoe, J
    Morimoto, K
    Shishibori, M
    Park, KH
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (03) : 476 - 491
  • [3] A query expansion algorithm based on phrases semantic similarity
    Liu, Yongli
    Li, Chao
    Zhang, Pin
    Xiong, Zhang
    2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 31 - 35
  • [4] A Fast Algorithm for Attribute Reduction Based on Trie Tree and Rough Set Theory
    Hu Feng
    Wang Xiao-yan
    Luo Chuan-jiang
    FIFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2012): ALGORITHMS, PATTERN RECOGNITION AND BASIC TECHNOLOGIES, 2013, 8784
  • [5] DAG decomposition based algorithm for graph similarity containment query
    Li, Xian-Tong
    Li, Jian-Zhong
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2009, 41 (06): : 113 - 117
  • [6] Trie-join: a trie-based method for efficient string similarity joins
    Jianhua Feng
    Jiannan Wang
    Guoliang Li
    The VLDB Journal, 2012, 21 : 437 - 461
  • [7] Trie-join: a trie-based method for efficient string similarity joins
    Feng, Jianhua
    Wang, Jiannan
    Li, Guoliang
    VLDB JOURNAL, 2012, 21 (04): : 437 - 461
  • [8] Research on Reverse Skyline Query Algorithm Based on Decision Set
    Huang, Lan
    Zhao, Yuanwei
    Mestre, Pedro
    Han, Laipeng
    Wang, Kangping
    Gao, Wenjuan
    Zhang, Rui
    JOURNAL OF DATABASE MANAGEMENT, 2022, 33 (01)
  • [9] An Efficient Query Algorithm for Trajectory Similarity Based on Frechet Distance Threshold
    Guo, Ning
    Ma, Mengyu
    Xiong, Wei
    Chen, Luo
    Jing, Ning
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2017, 6 (11)
  • [10] Similarity query processing algorithm over data stream based on LCSS
    Wang, Shaopeng
    Wen, Yingyou
    Zhao, Hong
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (09): : 1976 - 1991