A method of filtering Chinese webpage

被引:0
|
作者
机构
[1] Liu, Jie
[2] Luo, Li-Ming
[3] Wu, Yu-Hang
[4] Ma, Yi-Fang
[5] Cai, Hong-Mei
来源
Liu, J. (liujxxxy@126.com) | 1600年 / Beijing Institute of Technology卷 / 34期
关键词
Semantic Web - Semantics - Websites;
D O I
暂无
中图分类号
学科分类号
摘要
In view of the adverse effects of a variety of useless webpages, a method based on the Bayesian classification algorithm and domain ontology was proposed to filter the unwanted Chinese webpages. The method firstly calculated the weight of domain feature words according to the positive and negative domain webpages, established domain feature lexicon and constructed the domain ontology, got the weights library of ontology elements according to the positive domain webpages; then acquired the candidates by using the Bayesian classification algorithm; lastly semantically analyzed and filtered the candidates according to the domain ontology. This method can not only distinguish the positive and negative webpages which are in the same field but also get a good performance on the real-time of webpages filtering. The experiments on huge numbers of game-related webpages have shown promising results. The precision and recall are more than 98%, the average time of semantically analyzing one game webpage is 1~2 s, it has little effect on user browsing webpages.
引用
收藏
相关论文
共 50 条
  • [1] A Method Shielding the Chinese Game Webpage Based on Ontology
    Liu, Jie
    Bai, Ling
    Huang, XiangYang
    ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 1454 - +
  • [2] Multi-layer Filtering Webpage Classification Method Based on SVM
    Chen, Yiwen
    Yao, Zhilin
    HUMAN CENTERED COMPUTING, 2019, 11956 : 554 - 559
  • [3] Sensitive Webpage Filter Based on Multiple Filtering
    Wang Dongmei
    Ma Ming
    Sun Yan
    INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS, PTS 1-4, 2013, 241-244 : 2891 - 2896
  • [4] Learning outliers to refine a corpus for Chinese webpage categorization
    Luo, DS
    Wang, XH
    Wu, XH
    Chi, HS
    ADVANCES IN NATURAL COMPUTATION, PT 1, PROCEEDINGS, 2005, 3610 : 167 - 178
  • [5] An Improvement Method of Duplicate Webpage Detection
    Zhang, Chengqi
    Shang, Wenqian
    Li, Yafeng
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ELECTRONIC & MECHANICAL ENGINEERING AND INFORMATION TECHNOLOGY (EMEIT-2012), 2012, 23
  • [6] HAIF: A Hierarchical Attention-Based Model of Filtering Invalid Webpage
    Zhou, Chaoran
    Zhao, Jianping
    Ma, Tai
    Zhou, Xin
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (05) : 659 - 668
  • [7] Application of the Naive Bayesian Method with user current usage and hierarchy from website in Chinese Webpage Classification
    Li, Jinsong
    Xue, Weimin
    Dong, Nanping
    2007 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2007, : 1364 - +
  • [8] An optimal method for URL design of webpage journals
    Mousavilou, Zahra
    Oskouei, Rozita Jamili
    SOCIAL NETWORK ANALYSIS AND MINING, 2018, 8 (01)
  • [9] An Application of Lexical Semantics of Chinese in Webpage Keyword Extraction Algorithm
    Wang, Chanjuan
    Sun, Bin
    Zhang, Lu
    11TH CHINESE LEXICAL SEMANTICS WORKSHOP (CKSW2010), 2010, : 341 - 347
  • [10] Application of Project Teaching Method in Comprehensive Webpage Design
    Han, Baoyu
    2013 2ND INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE AND EDUCATION (ICSSE 2013), PT 2, 2013, 47 : 171 - 174