SVM based Chinese web page automatic classification

被引:4
|
作者
Liang, JZ [1 ]
机构
[1] Zhejiang Normal Univ, Inst Comp Sci, Jinhua 321004, Peoples R China
关键词
support vector machine; statistic learning; web page; text classification; pattern recognition;
D O I
10.1109/ICMLC.2003.1259884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with Chinese web page classification based on support vector machine (SVM). First, Some methods are proposed for feature extraction and selection based on textual keywords. Then Special problems are discussed on statistic learning theory, support vector machine and their application in classification. Quadratic program algorithm is also described for constructing the SVM classifier. In the experiment part, the sample set, including 5096 samples, is chosen from the web version of Chinese People's Daily. It is separated into two sets, the training set with 3398 samples and the test set with 1698 samples. Two kinds of kernel function, polynomial and radial basis function, are considered in constructing the SVM classifier. The final classification correct rates are 89.81%, 86.51% for the two classifiers, respectively.
引用
收藏
页码:2265 / 2268
页数:4
相关论文
共 50 条
  • [31] Application of SVM in web page categorization
    Xue, Weimin
    Huang, Weitong
    Lu, Yuchang
    2006 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, 2006, : 469 - +
  • [32] Web Page Classification Based on Social Annotations
    Shen, J.
    Xu, F. Y.
    Bi, L.
    Wei, L. H.
    He, K.
    Zhu, Y.
    ITESS: 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES, PT 1, 2008, : 1115 - 1121
  • [33] An approach to Web page classification based on granules
    Duan, Qiguo
    Miao, Duoqian
    Wang, Ruizhi
    Chen, Min
    PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 279 - 282
  • [34] Web Potential Customer Classification Based on SVM
    Sun, Lei
    Duan, Zhu
    2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 568 - 570
  • [35] Automatic Summarization of Web Page Based on Statistics and Structure
    Zheng, Shuangyi
    Yu, Junyang
    KNOWLEDGE DISCOVERY AND DATA MINING, 2012, 135 : 643 - +
  • [36] The Role of Word String Patterns in Chinese Web Page Genre Classification
    Wu, Yangyang
    Wu, Chukun
    IMCIC 2010: INTERNATIONAL MULTI-CONFERENCE ON COMPLEXITY, INFORMATICS AND CYBERNETICS, VOL I (POST-CONFERENCE EDITION), 2010, : 204 - 208
  • [37] Classification of Chinese Herbal medicines based on SVM
    20144900289658
    (1) School of Information Engineering, Guangdong University of Technology, Guangzhou; 510006, China; (2) School of Engineering, Auckland University of Technology, Auckland; 1142, New Zealand, Future University Hakodate; IEEE Sapporo Section; Xiamen University (Institute of Electrical and Electronics Engineers Inc., United States):
  • [38] Classification of Chinese Herbal Medicines Based on SVM
    Luo Dehan
    Wang Jia
    Chen Yimin
    Hamid, GholamHosseini
    2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE, ELECTRONICS AND ELECTRICAL ENGINEERING (ISEEE), VOLS 1-3, 2014, : 452 - +
  • [39] Malicious Web Page Detection Based on Feature Classification
    Phakoontod, Chanachai
    Limthanmaphon, Benchaphon
    2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 66 - 71
  • [40] A Web Page Classification Algorithm Based On Link Information
    Xu, Zhaohui
    Yan, Fuliang
    Qin, Jie
    Zhu, Haifeng
    2011 TENTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES), 2011, : 82 - 86