A Trie Based Set Similarity Query Algorithm

被引:1
|
作者
Jia, Lianyin [1 ,2 ]
Tang, Junzhuo [1 ]
Li, Mengjuan [3 ]
Li, Runxin [1 ]
Ding, Jiaman [1 ]
Chen, Yinong [4 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
[3] Yunnan Normal Univ, Lib, Kunming 650500, Peoples R China
[4] Arizona State Univ, Sch Comp & Augmented Intelligence, Tempe, AZ 85287 USA
基金
中国国家自然科学基金;
关键词
set similarity query; T-starTrie; FMNodes; TT-SSQ; EFFICIENT;
D O I
10.3390/math11010229
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Set similarity query is a primitive for many applications, such as data integration, data cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by one and do not have sufficient support for duplicated sets, thus leading to low efficiency. To solve this problem, this paper designs T-starTrie, an efficient trie based index for set similarity query, which can naturally group sets with the same prefix into one node, and can filter all sets corresponding to the node at a time, thereby significantly improving the candidates generation efficiency. In this paper, we find that the set similarity query problem can be transformed into matching nodes of the first-layer (FMNodes) detecting problem on T-starTrie. Therefore, an efficient FLMNode detection algorithm is designed. Based on this, an efficient set similarity query algorithm, TT-SSQ, is implemented by developing a variety of filtering techniques. Experimental results show that TT-SSQ can be up to 3.10x faster than existing algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Trie for similarity matching in large video databases
    Park, S
    Hyun, KH
    INFORMATION SYSTEMS, 2004, 29 (08) : 641 - 652
  • [32] Finding a Set of High-frequency Queries for High-frequency-query-based Filter for Similarity Join
    Kunanusont, Kamolwan
    Chongstitvatana, Jaruloj
    2015 12TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2015,
  • [33] Query Similarity for Approximate Query Answering
    Kantere, Verena
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2016, PT II, 2016, 9828 : 355 - 367
  • [34] A chip for a routing table based on a novel modified trie algorithm
    Torres, D
    Larios, A
    Guzmán, M
    VLSI DESIGN, 2000, 11 (04) : 405 - 415
  • [35] Study of an Improved Text Filter Algorithm Based on Trie Tree
    Yang, Wenchuan
    Fang, Zeyang
    Hui, Lei
    2016 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C), 2016, : 594 - 597
  • [36] The Adaptive Spelling Error Checking Algorithm based on Trie Tree
    Xu, Yongbing
    Wang, Junyi
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN ENERGY, ENVIRONMENT AND CHEMICAL ENGINEERING (AEECE 2016), 2016, 89 : 299 - 302
  • [37] A genetic algorithm for set query optimization in distributed database systems
    Wang, JC
    Horng, JT
    Hsu, YM
    Liu, BJ
    INFORMATION INTELLIGENCE AND SYSTEMS, VOLS 1-4, 1996, : 1977 - 1982
  • [38] Adaptive document clustering based on query-based similarity
    Na, Seung-Hoon
    Kang, In-Su
    Lee, Jong-Hyeok
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (04) : 887 - 901
  • [39] A Density-Aware Similarity Join Query Processing Algorithm on MapReduce
    Jang, Miyoung
    Song, Youngho
    Chang, Jae-Woo
    ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURETECH & MUE, 2016, 393 : 469 - 475
  • [40] Uncertainty Measurement and Attribute Reduction Algorithm Based on Kernel Similarity Rough Set Model
    Chen, Baoguo
    Chen, Lei
    Deng, Ming
    JOURNAL OF MATHEMATICS, 2022, 2022