Subtree Similarity Search Based on Structure and Text

被引：0

作者：

Mizokami, Takuya ^{[1
]}

Bou, Savong ^{[2
]}

Amagasa, Toshiyuki ^{[2
]}

机构：

[1] Univ Tsukuba, Grad Sch Sci & Technol, Tsukuba, Ibaraki, Japan

[2] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki, Japan

来源：

BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2024 | 2024年 / 14912卷

关键词：

Approximate Matching; Similarity search; Tree edit distance; TREE EDIT DISTANCE; ALGORITHMS; EFFICIENT; FRAMEWORK; ROBUST;

D O I：

10.1007/978-3-031-68323-7_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Given a query tree, the subtree similarity search problem is finding all subtrees in a document tree that are similar to the query tree. The previous scan-based method extracts candidate subtrees based on the size difference, which only considers the structural differences and ignores the differences in the contents represented by the trees. For this reason, it suffers from the following two issues. First, for queries against a tree with a regular structure, it is difficult to differentiate subtrees in terms of structural similarity, yielding a large number of candidate results to verify. Second, the candidates are verified by computing the tree edit distance, which is cubic to the number of tree nodes. In this paper, we propose a solution for the subtree similarity search problem based on the structure and contents of the trees. We demonstrate through experiments that our proposed method outperforms the previous scan-based methods in terms of speed and is competitive with index-based methods.

引用

页码：72 / 87

页数：16

共 50 条

[41] Short text similarity based on probabilistic topics
Xiaojun Quan
Gang Liu
Zhi Lu
Xingliang Ni
Liu Wenyin
Knowledge and Information Systems, 2010, 25 : 473 - 491
[42] Short text similarity based on probabilistic topics
Quan, Xiaojun
Liu, Gang
Lu, Zhi
Ni, Xingliang
Wenyin, Liu
KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (03) : 473 - 491
[43] Computing Text Similarity Based on HNC Theory
Wei, Xiangfeng
Zang, Hanfen
Zhang, Quan
RECENT ADVANCES OF ASIAN LANGUAGE PROCESSING TECHNOLOGIES, 2008, : 150 - 154
[44] Text similarity detection method based on NLP
Dai X.
Liu S.
Gong D.
Tongxin Xuebao/Journal on Communications, 2021, 42 (10): : 173 - 181
[45] A concept similarity based text classification algorithm
Peng, Jing
Yang, Dong-qing
Tang, Shi-Wei
Gao, Jun
Zhang, Peng-yi
Fu, Yan
FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2007, : 535 - 539
[46] Research on the Detection of Text Similarity Based on Hadoop
Wang Zhuo-hao
Yang Dong-ju
2019 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2019, : 683 - 689
[47] An effective short text conceptualization based on new short text similarity
Bekkali, Mohammed
Lachkar, Abdelmonaime
SOCIAL NETWORK ANALYSIS AND MINING, 2018, 9 (01)
[48] Text Similarity Function Based on Word Embeddings for Short Text Analysis
Pascual, Adrian Jimenez
Fujita, Sumio
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 391 - 402
[49] Extended Subtree: A New Similarity Function for Tree Structured Data
Shahbazi, Ali
Miller, James
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (04) : 864 - 877
[50] A Scalable Index for Top-k Subtree Similarity Queries
Kocher, Daniel
Augsten, Nikolaus
SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1624 - 1641

← 1 2 3 4 5 →