Evaluate structure similarity in XML documents with merge-edit-distance

被引:0
|
作者
Zhou, Chong [1 ]
Lu, Yansheng [1 ]
Zou, Lei [1 ]
Hu, Rong [1 ]
机构
[1] Huazhong Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan 430074, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML language is widely used as a standard for data representation and exchange among Web applications. In recent years, many efforts have been spent in querying, integrating and clustering XML documents. Measuring the similarity among XML documents is the foundation of such applications. In this paper, we propose a new similarity measure method among the XML documents, which is based on Merge-Edit-Distance (MED). MED upholds the distribution information of the common tree in XML document trees. We urge the distribution information is useful for determining the similarity of XML documents. A novel algorithm is also proposed to calculate MED as follows. Given two XML document trees A and B, it compresses the two trees into one merge tree C and then transforms the tree C to the common tree of A and B with the defined operations such as "Delete", "Reduce", "Combine". The cost of the operation sequence is defined as MED. The experiments on real datasets give the evidence that the proposed similarity measure is effective.
引用
收藏
页码:301 / 311
页数:11
相关论文
共 50 条
  • [21] Fast Similarity Search for Graphs by Edit Distance
    D. A. Rachkovskij
    Cybernetics and Systems Analysis, 2019, 55 : 1039 - 1051
  • [22] Similarity computation for XML documents by XML element sequence patterns
    Zhang, Haiwei
    Yuan, Xiaojie
    Yang, Na
    Liu, Zhongqi
    PROGRESS IN WWW RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2008, 4976 : 227 - 232
  • [23] Structural similarity between XML documents and DTDs
    Ng, PKL
    Ng, VTY
    COMPUTATIONAL SICENCE - ICCS 2003, PT III, PROCEEDINGS, 2003, 2659 : 412 - 421
  • [24] Using structural similarity for clustering XML documents
    Aitelhadj, Ali
    Boughanem, Mohand
    Mezghiche, Mohamed
    Souam, Fatiha
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 109 - 139
  • [25] Using structural similarity for clustering XML documents
    Ali Aïtelhadj
    Mohand Boughanem
    Mohamed Mezghiche
    Fatiha Souam
    Knowledge and Information Systems, 2012, 32 : 109 - 139
  • [26] Semantic Structural Similarity for Clustering XML Documents
    Kim, Tae-Soon
    Lee, Ju-Hong
    Song, Jae-Won
    ICHIT 2008: INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 552 - 557
  • [27] Clustering XML documents based on structural similarity
    Xing, Guangming
    Xia, Zhonghang
    Guo, Jinhua
    ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 905 - +
  • [28] Comparative Analysis of Merge Trees Using Local Tree Edit Distance
    Sridharamurthy, Raghavendra
    Natarajan, Vijay
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (02) : 1518 - 1530
  • [29] Computing Text Similarity using Tree Edit Distance
    Sidorov, Grigori
    Gomez-Adorno, Helena
    Markov, Ilia
    Pinto, David
    Loya, Nahun
    2015 ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY DIGIPEN NAFIPS 2015, 2015,
  • [30] Efficient Graph Similarity Joins with Edit Distance Constraints
    Zhao, Xiang
    Xiao, Chuan
    Lin, Xuemin
    Wang, Wei
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 834 - 845