Subtree Similarity Search Based on Structure and Text

被引:0
|
作者
Mizokami, Takuya [1 ]
Bou, Savong [2 ]
Amagasa, Toshiyuki [2 ]
机构
[1] Univ Tsukuba, Grad Sch Sci & Technol, Tsukuba, Ibaraki, Japan
[2] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki, Japan
关键词
Approximate Matching; Similarity search; Tree edit distance; TREE EDIT DISTANCE; ALGORITHMS; EFFICIENT; FRAMEWORK; ROBUST;
D O I
10.1007/978-3-031-68323-7_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a query tree, the subtree similarity search problem is finding all subtrees in a document tree that are similar to the query tree. The previous scan-based method extracts candidate subtrees based on the size difference, which only considers the structural differences and ignores the differences in the contents represented by the trees. For this reason, it suffers from the following two issues. First, for queries against a tree with a regular structure, it is difficult to differentiate subtrees in terms of structural similarity, yielding a large number of candidate results to verify. Second, the candidates are verified by computing the tree edit distance, which is cubic to the number of tree nodes. In this paper, we propose a solution for the subtree similarity search problem based on the structure and contents of the trees. We demonstrate through experiments that our proposed method outperforms the previous scan-based methods in terms of speed and is competitive with index-based methods.
引用
收藏
页码:72 / 87
页数:16
相关论文
共 50 条
  • [41] Short text similarity based on probabilistic topics
    Xiaojun Quan
    Gang Liu
    Zhi Lu
    Xingliang Ni
    Liu Wenyin
    Knowledge and Information Systems, 2010, 25 : 473 - 491
  • [42] Short text similarity based on probabilistic topics
    Quan, Xiaojun
    Liu, Gang
    Lu, Zhi
    Ni, Xingliang
    Wenyin, Liu
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (03) : 473 - 491
  • [43] Computing Text Similarity Based on HNC Theory
    Wei, Xiangfeng
    Zang, Hanfen
    Zhang, Quan
    RECENT ADVANCES OF ASIAN LANGUAGE PROCESSING TECHNOLOGIES, 2008, : 150 - 154
  • [44] Text similarity detection method based on NLP
    Dai X.
    Liu S.
    Gong D.
    Tongxin Xuebao/Journal on Communications, 2021, 42 (10): : 173 - 181
  • [45] A concept similarity based text classification algorithm
    Peng, Jing
    Yang, Dong-qing
    Tang, Shi-Wei
    Gao, Jun
    Zhang, Peng-yi
    Fu, Yan
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2007, : 535 - 539
  • [46] Research on the Detection of Text Similarity Based on Hadoop
    Wang Zhuo-hao
    Yang Dong-ju
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2019, : 683 - 689
  • [47] An effective short text conceptualization based on new short text similarity
    Bekkali, Mohammed
    Lachkar, Abdelmonaime
    SOCIAL NETWORK ANALYSIS AND MINING, 2018, 9 (01)
  • [48] Text Similarity Function Based on Word Embeddings for Short Text Analysis
    Pascual, Adrian Jimenez
    Fujita, Sumio
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 391 - 402
  • [49] Extended Subtree: A New Similarity Function for Tree Structured Data
    Shahbazi, Ali
    Miller, James
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (04) : 864 - 877
  • [50] A Scalable Index for Top-k Subtree Similarity Queries
    Kocher, Daniel
    Augsten, Nikolaus
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1624 - 1641