Clustering XML Documents Using Structure and Content based on a New Similarity Function OverallSimSUX

被引:6
|
作者
Magdaleno, Damny [1 ]
Fuentes, Vett E. [1 ]
Garcia, Maria M. [1 ]
机构
[1] Univ Cent Marta Abreu de Las Villas UCLV, Comp Sci Dept, Villa Clara, Cuba
来源
COMPUTACION Y SISTEMAS | 2015年 / 19卷 / 01期
关键词
Clustering; XML; structure and content; similarity;
D O I
10.13053/CyS-19-1-1922
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Every day more digital data in semi-structured format are available on the World Wide Web, corporate intranets, and other media. Knowledge management using information search and processing is essential in the field of academic writing. This task becomes increasingly complex and defiant, mainly because collections of documents are usually heterogeneous, big, diverse, and dynamic. To resolve these challenges it is essential to improve management of time necessary to process scientific information. In this paper, we propose a new method of automatic clustering of XML documents based on their content and structure, as well as on a new similarity function OverallSimSUX which facilitates capturing the degree of similarity among documents. Evaluation of our proposal by means of experiments with data sets showed better results than those in previous work.
引用
收藏
页码:151 / 161
页数:11
相关论文
共 50 条
  • [1] Structure and Content Similarity for Clustering XML Documents
    Zhang, Lijun
    Li, Zhanhuai
    Chen, Qun
    Li, Ning
    WEB-AGE INFORMATION MANAGEMENT, 2010, 6185 : 116 - 124
  • [2] Classifying XML documents based on Structure/Content similarity
    Xing, Guangming
    Guo, Jinhua
    Xia, Zhonghang
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 444 - 457
  • [3] Clustering of XML Documents Based on Structure and Aggregated Content
    Rezk, Nermeen Gamal
    Sarhan, Amany
    Algergawy, Alsaved
    PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 93 - 102
  • [4] Using structural similarity for clustering XML documents
    Aitelhadj, Ali
    Boughanem, Mohand
    Mezghiche, Mohamed
    Souam, Fatiha
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 109 - 139
  • [5] Using structural similarity for clustering XML documents
    Ali Aïtelhadj
    Mohand Boughanem
    Mohamed Mezghiche
    Fatiha Souam
    Knowledge and Information Systems, 2012, 32 : 109 - 139
  • [6] Clustering XML documents based on structural similarity
    Xing, Guangming
    Xia, Zhonghang
    Guo, Jinhua
    ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 905 - +
  • [7] XCLSC: Structure and Content-based Clustering of XML Documents
    Bessine, Karima
    Nehar, Attia
    Cherroun, Hadda
    Moussaoui, Abdelouahab
    2015 12TH IEEE INTERNATIONAL CONFERENCE ON PROGRAMMING AND SYSTEMS (ISPS), 2015, : 221 - 227
  • [8] Clustering XML Documents by Combining Content and Structure
    Guo Yongming
    Chen Dehua
    Le Jiajin
    ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 1, 2008, : 583 - 587
  • [9] FXProj - A Fuzzy XML Documents Projected Clustering Based on Structure and Content
    Ji, Tengfei
    Bao, Xiaoyuan
    Yang, Dongqing
    ADVANCED DATA MINING AND APPLICATIONS, PT I, 2011, 7120 : 406 - 419
  • [10] Semantic Structural Similarity for Clustering XML Documents
    Kim, Tae-Soon
    Lee, Ju-Hong
    Song, Jae-Won
    ICHIT 2008: INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 552 - 557