Extended VSM for XML Document Classification Using Frequent Subtrees

被引:0
|
作者
Yang, Jianwu [1 ]
Wang, Songlin [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Tech, Beijing 100871, Peoples R China
来源
关键词
XML Document; Classification; Vector Space Model (VSM); Structured Link Vector Model (SLVM); Frequent Subtree;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Structured link vector model (SLVM) is a representation proposed for modeling XML documents which was extended from the conventional vector space model (VSM) by incorporating document structures In this paper we describe the classification approach for XML documents based on SLVM in the Document Mining Challenge of INEX 2009 where the closed frequent subtrees as structural units are used for content extraction from the XML document and the Chi-square test is used for feature selection
引用
收藏
页码:441 / 448
页数:8
相关论文
共 50 条
  • [21] A Framework for Learning Comprehensible Theories in XML Document Classification
    Wu, Jemma
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (01) : 1 - 14
  • [22] Reverse engineering from an XML document into an extended DTD graph
    Shiu, Herbert
    Fong, Joseph
    JOURNAL OF DATABASE MANAGEMENT, 2008, 19 (04) : 62 - 80
  • [23] Reverse Engineering from an XML Document into an Extended DTD Graph
    Shiu, Herbert
    Fong, Joseph
    JOURNAL OF DATABASE MANAGEMENT, 2009, 20 (02) : 38 - 57
  • [24] Combining Content and Structure Similarity for XML Document Classification using Composite SVM Kernels
    Ghosh, Saptarshi
    Mitra, Pabitra
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1857 - 1860
  • [25] Weighted Frequent Itemset Mining Using Weighted Subtrees: WST-WFIM
    Nalousi, Saeed
    Farhang, Yousef
    Sangar, Amin Babazadeh
    IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2021, 44 (02): : 206 - 215
  • [26] Efficiently mining closed constrained frequent ordered subtrees by using border information
    Ozaki, Tomonobu
    Ohkawa, Takenao
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 745 - +
  • [27] Finding Good Subtrees for Constraint Optimization Problems Using Frequent Pattern Mining
    Li, Hongbo
    Lee, Jimmy
    Mi, He
    Yin, Minghao
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 1577 - 1584
  • [28] Unsupervised classification of text-centric XML document collections
    Doucet, Antoine
    Lehtonen, Miro
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 497 - 509
  • [29] Classification tree embedded XML document structure design for enhanced web document utilization
    Choi, Doug Won
    Shin, Jin Kyu
    ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 542 - +
  • [30] Optimization of XML Queries by Using Semantics in XML Schemas and the Document Structure
    Le, Dung Xuan Thi
    Maghaydah, Moad
    Orgun, Mehmet A.
    Zhong, Youliang
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2013, PT I, 2013, 8180 : 343 - 353