A Novel Method for Measuring Structure and Semantic Similarity of XML Documents Based on Extended Adjacency Matrix

被引:2
|
作者
Zhang, Xue-Liang [1 ]
Yang, Ting [1 ]
Fan, Bao-Quan [1 ]
Wang, Xu [1 ]
Wei, Jin-Mao [1 ]
机构
[1] Nankai Univ, Coll Informat Tech Sci, Tianjin 300071, Peoples R China
关键词
similarity; XML; semantic; structure; adjacency matrix;
D O I
10.1016/j.phpro.2012.02.215
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Similarity measurement of XML documents is crucial to meet various needs of approximate searches and document classifications in XML-oriented applications. Some methods have been proposed for this purpose. Nevertheless, few methods can be elegantly exploited to depict structure and semantic information and hence to effectively measure the similarity of XML documents. In this paper, we present a new method of computing the structure and semantic similarity of XML documents based on extended adjacency matrix(EAM). Different from a general adjacency matrix, in an EAM, the structure information of not only the adjacent layers but also the ancestor-descendant layers can be stored. For measuring the similarity of two XML documents, the proposed method firstly stores the structure and semantic information in two extended adjacency matrices (M-1,M-2). Then it computes similarity of the two documents through cos(M-1,M-2). Experimental results on bench-mark data show that the method holds high efficiency and accuracy. (C) 2011 Published by Elsevier B.V. Selection and/or peer-review under responsibility of ICAPIE Organization Committee.
引用
收藏
页码:1452 / 1461
页数:10
相关论文
共 50 条
  • [21] A METHODOLOGY FOR USING EDGES TO MEASURE STRUCTURAL AND SEMANTIC SIMILARITY OF XML DOCUMENTS
    Qiu, Hong-Jun
    Yu, Wen-Jing
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 1653 - +
  • [22] MCRWR: a new method to measure the similarity of documents based on semantic network
    Xianwei Pan
    Peng Huang
    Shan Li
    Lei Cui
    BMC Bioinformatics, 23
  • [23] MCRWR: a new method to measure the similarity of documents based on semantic network
    Pan, Xianwei
    Huang, Peng
    Li, Shan
    Cui, Lei
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [24] A novel wordnet-based approach for measuring semantic similarity
    Zhu, Xinhua
    Li, Fei
    Chen, Hongchao
    Mao, Junqing
    Journal of Information and Computational Science, 2015, 12 (13): : 4919 - 4927
  • [25] Measuring Semantic Similarity between Words Using Web Documents
    Takale, Sheetal A.
    Nandgaonkar, Sushma S.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2010, 1 (04) : 78 - 85
  • [26] Measuring semantic similarity of documents with weighted cosine and fuzzy logic
    Huetle-Figueroa, Juan
    Perez-Tellez, Fernando
    Pinto, David
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2263 - 2278
  • [27] An improved method for classifying XML documents based on structure and content
    Zhang Na
    Zhang Dongzhan
    Yu Ye
    Duan Jiangjiao
    THIRD INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2010), 2010, : 426 - 430
  • [28] Ontology based semantic similarity comparison of documents
    Oleshchuk, V
    Pedersen, A
    14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 735 - 738
  • [29] Clustering XML Documents Using Structure and Content based on a New Similarity Function OverallSimSUX
    Magdaleno, Damny
    Fuentes, Vett E.
    Garcia, Maria M.
    COMPUTACION Y SISTEMAS, 2015, 19 (01): : 151 - 161
  • [30] Clustering Algorithm Based on Semantic Distance for XML Documents
    Yang, Lingxian
    Gu, Jinguang
    Chen, Heping
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 549 - +