Subtree Similarity Search Based on Structure and Text

被引:0
|
作者
Mizokami, Takuya [1 ]
Bou, Savong [2 ]
Amagasa, Toshiyuki [2 ]
机构
[1] Univ Tsukuba, Grad Sch Sci & Technol, Tsukuba, Ibaraki, Japan
[2] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki, Japan
关键词
Approximate Matching; Similarity search; Tree edit distance; TREE EDIT DISTANCE; ALGORITHMS; EFFICIENT; FRAMEWORK; ROBUST;
D O I
10.1007/978-3-031-68323-7_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a query tree, the subtree similarity search problem is finding all subtrees in a document tree that are similar to the query tree. The previous scan-based method extracts candidate subtrees based on the size difference, which only considers the structural differences and ignores the differences in the contents represented by the trees. For this reason, it suffers from the following two issues. First, for queries against a tree with a regular structure, it is difficult to differentiate subtrees in terms of structural similarity, yielding a large number of candidate results to verify. Second, the candidates are verified by computing the tree edit distance, which is cubic to the number of tree nodes. In this paper, we propose a solution for the subtree similarity search problem based on the structure and contents of the trees. We demonstrate through experiments that our proposed method outperforms the previous scan-based methods in terms of speed and is competitive with index-based methods.
引用
收藏
页码:72 / 87
页数:16
相关论文
共 50 条
  • [31] An access structure for similarity search in metric spaces
    Dohnal, V
    CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2004 WORKSHOPS, PROCEEDINGS, 2004, 3268 : 133 - 143
  • [32] Extract salient words with WordRank for effective similarity search in text data
    Wan, XJ
    Yang, JW
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005, 2005, 3806 : 590 - 591
  • [33] Text Categorization via Similarity Search An Efficient and Effective Novel Algorithm
    Duan, Hubert Haoyang
    Pestov, Vladimir G.
    Singla, Varun
    SIMILARITY SEARCH AND APPLICATIONS (SISAP), 2013, 8199 : 182 - 193
  • [34] Visual Analytics and Similarity Search - Interest-based Similarity Search in Scientific Data
    Blazevic, Midhad
    Sina, Lennart B.
    Burkhardt, Dirk
    Siegel, Melanie
    Nazemi, Kawa
    2021 25TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV): AI & VISUAL ANALYTICS & DATA SCIENCE, 2021, : 211 - 217
  • [35] A Novel Discrimination Structure for Assessing Text Semantic Similarity
    Ding, Peng
    Liu, Dan
    Zhang, Zhiyuan
    Hu, Jie
    Liu, Ning
    JOURNAL OF INTERNET TECHNOLOGY, 2022, 23 (04): : 709 - 717
  • [36] Semantic text similarity using corpus-based word similarity and string similarity
    University of Ottawa
    不详
    ACM Transactions on Knowledge Discovery from Data, 2008, 2 (02)
  • [37] A general method for tree-comparison based on subtree similarity and its use in a taxonomic database
    Zhong, Y
    Meacham, CA
    Pramanik, S
    BIOSYSTEMS, 1997, 42 (01) : 1 - 8
  • [38] Text similarity computing based on standard deviation
    Liu, T
    Guo, J
    ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 456 - 464
  • [39] A Text Similarity Measure Based on Suffix Tree
    Huang, Chenghui
    Liu, Yan
    Xia, Shengzhong
    Yin, Jian
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (02): : 583 - 592
  • [40] The Similarity of Text Based on Hierarchical Network of Concept
    Xu Xiaoqing
    Wang Jingzhong
    PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL I, 2009, : 304 - +