Evaluate structure similarity in XML documents with merge-edit-distance

被引:0
|
作者
Zhou, Chong [1 ]
Lu, Yansheng [1 ]
Zou, Lei [1 ]
Hu, Rong [1 ]
机构
[1] Huazhong Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan 430074, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML language is widely used as a standard for data representation and exchange among Web applications. In recent years, many efforts have been spent in querying, integrating and clustering XML documents. Measuring the similarity among XML documents is the foundation of such applications. In this paper, we propose a new similarity measure method among the XML documents, which is based on Merge-Edit-Distance (MED). MED upholds the distribution information of the common tree in XML document trees. We urge the distribution information is useful for determining the similarity of XML documents. A novel algorithm is also proposed to calculate MED as follows. Given two XML document trees A and B, it compresses the two trees into one merge tree C and then transforms the tree C to the common tree of A and B with the defined operations such as "Delete", "Reduce", "Combine". The cost of the operation sequence is defined as MED. The experiments on real datasets give the evidence that the proposed similarity measure is effective.
引用
收藏
页码:301 / 311
页数:11
相关论文
共 50 条
  • [1] Minimum Tree Edit distance between XML and Probabilistic XML Documents
    Ma, Haitao
    Xu, Changming
    Fang, Miao
    Yu, Changyong
    2014 IEEE WORKSHOP ON ELECTRONICS, COMPUTER AND APPLICATIONS, 2014, : 391 - 394
  • [2] Edit Distance between Merge Trees
    Sridharamurthy, Raghavendra
    Bin Masood, Talha
    Kamakshidasan, Adhitya
    Natarajan, Vijay
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (03) : 1518 - 1531
  • [3] Structure and Content Similarity for Clustering XML Documents
    Zhang, Lijun
    Li, Zhanhuai
    Chen, Qun
    Li, Ning
    WEB-AGE INFORMATION MANAGEMENT, 2010, 6185 : 116 - 124
  • [4] Classifying XML documents based on Structure/Content similarity
    Xing, Guangming
    Guo, Jinhua
    Xia, Zhonghang
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 444 - 457
  • [5] A methodology for measuring structure similarity of fuzzy XML documents
    Zhen Zhao
    Zongmin Ma
    Computing, 2017, 99 : 493 - 506
  • [6] A methodology for measuring structure similarity of fuzzy XML documents
    Zhao, Zhen
    Ma, Zongmin
    COMPUTING, 2017, 99 (05) : 493 - 506
  • [7] Similarity measurement of XML documents based on structure and contents
    Kim, Tae-Soon
    Lee, Ju-Hong
    Song, Jae-Won
    Kim, Deok-Hwan
    COMPUTATIONAL SCIENCE - ICCS 2007, PT 3, PROCEEDINGS, 2007, 4489 : 902 - +
  • [8] A Deformation-based Edit Distance for Merge Trees
    Wetzels, Florian
    Garth, Christoph
    2022 IEEE WORKSHOP ON TOPOLOGICAL DATA ANALYSIS AND VISUALIZATION (TOPOINVIS 2022), 2022, : 29 - 38
  • [9] A Survey on Tree Edit Distance Lower Bound Estimation Techniques for Similarity Join on XML Data
    Li, Fei
    Wang, Hongzhi
    Li, Jianzhong
    Gao, Hong
    SIGMOD RECORD, 2013, 42 (04) : 29 - 39
  • [10] Phrase similarity through the edit distance
    Vilares, M
    Ribadas, FJ
    Vilares, J
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, 3180 : 306 - 317