Extended VSM for XML Document Classification Using Frequent Subtrees

被引:0
|
作者
Yang, Jianwu [1 ]
Wang, Songlin [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Tech, Beijing 100871, Peoples R China
来源
关键词
XML Document; Classification; Vector Space Model (VSM); Structured Link Vector Model (SLVM); Frequent Subtree;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Structured link vector model (SLVM) is a representation proposed for modeling XML documents which was extended from the conventional vector space model (VSM) by incorporating document structures In this paper we describe the classification approach for XML documents based on SLVM in the Document Mining Challenge of INEX 2009 where the closed frequent subtrees as structural units are used for content extraction from the XML document and the Chi-square test is used for feature selection
引用
收藏
页码:441 / 448
页数:8
相关论文
共 50 条
  • [31] A Novel Document and Query Similarity Indexing using VSM for Unstructured Documents
    Reshma, P. K.
    Rajagopal, Suharshala
    Lajish, V. L.
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 676 - 681
  • [32] Mining the Frequent Patterns of Named Entities for Long Document Classification
    Wang, Bohan
    Qi, Rui
    Gao, Jinhua
    Zhang, Jianwei
    Yuan, Xiaoguang
    Ke, Wenjun
    APPLIED SCIENCES-BASEL, 2022, 12 (05):
  • [33] Document image representation using XML technologies
    El-Kwae, EA
    Atmakuri, KH
    DOCUMENT RECOGNITION AND RETRIEVAL IX, 2002, 4670 : 109 - 120
  • [34] Using a semantic model and XML for document annotation
    Czejdo, BD
    Sobaniec, C
    INTELLIGENT PROBLEM SOLVING: METHODOLOGIES AND APPROACHES, PRODEEDINGS, 2000, 1821 : 236 - 241
  • [35] XML document clustering using common XPath
    Leung, HP
    Chung, FL
    Chan, SCF
    Luk, R
    INTERNATIONAL WORKSHOP ON CHALLENGES IN WEB INFORMATION RETRIEVAL AND INTEGRATION, PROCEEDINGS, 2005, : 91 - 96
  • [36] Extended corner classification neural network based document classification approach
    Chen, En-Hong
    Zhang, Zhen-Ya
    Aihara, Kazuyuki
    Wang, Xu-Fa
    Ruan Jian Xue Bao/Journal of Software, 2002, 13 (05): : 871 - 878
  • [37] Hierarchical document clustering using frequent itemsets
    Fung, BCM
    Wang, K
    Ester, M
    PROCEEDINGS OF THE THIRD SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2003, : 59 - 70
  • [38] Sequential pattern mining for structure-based XML document classification
    Garboni, Calin
    Masseglia, Florent
    Trousse, Brigitte
    ADVANCES IN XML INFORMATION RETRIEVAL AND EVALUATION, 2006, 3977 : 458 - 468
  • [39] Natural Language Inference for Arabic Using Extended Tree Edit Distance with Subtrees
    Alabbas, Maytham
    Ramsay, Allan
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 48 : 1 - 22
  • [40] Web document classification based on extended rough set
    Yi, GX
    Hu, HP
    Lu, ZD
    PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 916 - 918