A Trie Based Set Similarity Query Algorithm

被引:1
|
作者
Jia, Lianyin [1 ,2 ]
Tang, Junzhuo [1 ]
Li, Mengjuan [3 ]
Li, Runxin [1 ]
Ding, Jiaman [1 ]
Chen, Yinong [4 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
[3] Yunnan Normal Univ, Lib, Kunming 650500, Peoples R China
[4] Arizona State Univ, Sch Comp & Augmented Intelligence, Tempe, AZ 85287 USA
基金
中国国家自然科学基金;
关键词
set similarity query; T-starTrie; FMNodes; TT-SSQ; EFFICIENT;
D O I
10.3390/math11010229
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Set similarity query is a primitive for many applications, such as data integration, data cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by one and do not have sufficient support for duplicated sets, thus leading to low efficiency. To solve this problem, this paper designs T-starTrie, an efficient trie based index for set similarity query, which can naturally group sets with the same prefix into one node, and can filter all sets corresponding to the node at a time, thereby significantly improving the candidates generation efficiency. In this paper, we find that the set similarity query problem can be transformed into matching nodes of the first-layer (FMNodes) detecting problem on T-starTrie. Therefore, an efficient FLMNode detection algorithm is designed. Based on this, an efficient set similarity query algorithm, TT-SSQ, is implemented by developing a variety of filtering techniques. Experimental results show that TT-SSQ can be up to 3.10x faster than existing algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Similarity computation between fuzzy set and crisp set with similarity measure based on distance
    Lee, Sang H.
    Park, Hyunjeong
    Park, Wook Je
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 644 - +
  • [42] Efficient and scalable trie-based algorithms for computing set containment relations
    Luo, Yongming
    Fletcher, George H. L.
    Hidders, Jan
    De Bra, Paul
    2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 303 - 314
  • [43] Query expansion based on term similarity tree model
    Jin, QL
    Zhao, J
    Xu, B
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 400 - 406
  • [44] Similarity of temporal query logs based on ARIMA model
    Liu, Ning
    Nong, Shuzhen
    Yan, Jun
    Zhang, Benyu
    Chen, Zheng
    Li, Ying
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 975 - 979
  • [45] Query expansion based on naive bayes and semantic similarity
    Zheng Z.
    Yu M.
    Wang N.
    Zhang X.
    Ruan C.
    Li D.
    Li, Dun (ielidun@zzu.edu.cn), 2018, Totem Publishers Ltd (14) : 1421 - 1430
  • [46] Content-Based retrieval supporting similarity query
    Yoon, MH
    Kim, KC
    Yoon, YI
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, PROCEEDINGS, 1999, : 218 - 224
  • [47] Measuring fuzzy query responses in similarity based models
    Seda, M
    Dvorak, J
    KNOWLEDGE-BASED SOFTWARE ENGINEERING, 1998, 48 : 266 - 269
  • [48] Similarity of temporal query logs based on ARIMA model
    Liu, Ning
    Nong, Shuzhen
    Yan, Jun
    Zhang, Benyu
    Chen, Zheng
    Li, Ying
    ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 366 - 370
  • [49] Similarity Learning Based Query Modeling for Keyword Search
    Gundogdu, Batuhan
    Saraclar, Murat
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3617 - 3621
  • [50] Towards similarity-based topological query languages
    Belussi, Alberto
    Boucelma, Omar
    Catania, Barbara
    Lassoued, Yassine
    Podesta, Paola
    CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2006, 2006, 4254 : 675 - 686