Subtree Similarity Search Based on Structure and Text

被引:0
|
作者
Mizokami, Takuya [1 ]
Bou, Savong [2 ]
Amagasa, Toshiyuki [2 ]
机构
[1] Univ Tsukuba, Grad Sch Sci & Technol, Tsukuba, Ibaraki, Japan
[2] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki, Japan
关键词
Approximate Matching; Similarity search; Tree edit distance; TREE EDIT DISTANCE; ALGORITHMS; EFFICIENT; FRAMEWORK; ROBUST;
D O I
10.1007/978-3-031-68323-7_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a query tree, the subtree similarity search problem is finding all subtrees in a document tree that are similar to the query tree. The previous scan-based method extracts candidate subtrees based on the size difference, which only considers the structural differences and ignores the differences in the contents represented by the trees. For this reason, it suffers from the following two issues. First, for queries against a tree with a regular structure, it is difficult to differentiate subtrees in terms of structural similarity, yielding a large number of candidate results to verify. Second, the candidates are verified by computing the tree edit distance, which is cubic to the number of tree nodes. In this paper, we propose a solution for the subtree similarity search problem based on the structure and contents of the trees. We demonstrate through experiments that our proposed method outperforms the previous scan-based methods in terms of speed and is competitive with index-based methods.
引用
收藏
页码:72 / 87
页数:16
相关论文
共 50 条
  • [1] A General Algorithm for Subtree Similarity-Search
    Cohen, Sara
    Or, Nerya
    2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 928 - 939
  • [2] Hamming Distance based Approximate Similarity Text Search Algorithm
    Hu, Haifeng
    Zhang, Liang
    Wu, Jianshen
    2015 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2015, : 1 - 6
  • [3] Batch Text Similarity Search with MapReduce
    Li, Rui
    Ju, Li
    Peng, Zhuo
    Yu, Zhiwei
    Wang, Chaokun
    WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 412 - +
  • [4] Local Similarity Search for Unstructured Text
    Wang, Pei
    Xiao, Chuan
    Qin, Jianbin
    Wang, Wei
    Zhang, Xiaoyang
    Ishikawa, Yoshiharu
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1991 - 2005
  • [5] Continuous Similarity Search for Text Sets
    Tsuchida, Yuma
    Kubo, Kohei
    Koga, Hisashi
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT II, 2022, 13427 : 229 - 234
  • [6] Measuring text similarity based on structure and word embedding
    Farouk, Mamdouh
    COGNITIVE SYSTEMS RESEARCH, 2020, 63 : 1 - 10
  • [7] Text information similarity search algorithm based on segment estimation and PageRank
    Zhai L.
    Cui X.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2022, 52 (04): : 910 - 915
  • [8] An Information Intelligent Search Method for Computer Forensics Based on Text Similarity
    Yang, Zhongxin
    Chen, Zhifeng
    Zhang, Ping
    Liu, Ming
    Li, Qingbao
    2020 4TH INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, SECURITY AND PRIVACY (ICCSP 2020), 2020, : 79 - 83
  • [9] Text similarity: an alternative way to search MEDLINE
    Lewis, James
    Ossowski, Stephan
    Hicks, Justin
    Errami, Mounir
    Garner, Harold R.
    BIOINFORMATICS, 2006, 22 (18) : 2298 - 2304
  • [10] Continuous Similarity Search for Dynamic Text Streams
    Tsuchida, Yuma
    Kubo, Kohei
    Koga, Hisashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (12) : 2026 - 2035