Web Spam Detection Based on Improved Tri-training

被引:0
|
作者
Li, Hailong [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
关键词
web spam; search engine; web spam detection; tri-training; co-training; feature view;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Web spamming is the deliberate manipulation of search engine indexes to make a page get high ranking than which it deserved considering its true value. Since the evolution of web spam, a new based on machine learning algorithm web spam detection method which has self-learning ability has emerged. Web spam detection is viewed as a binary classification learning problem. Because labeled training examples are fairly expensive to obtain which need the participation of experts in this field and labor costs, how to fully utilize a large number of unlabeled web page examples on the web is a challenge faced by web spam detection. In this paper, we present a web spam detection algorithm according to improve tri-training. It uses a small amount of labeled examples and a large number of unlabeled examples to train classifiers, which can reduce the cost of labeled examples and improve the learning performance. Both web page content features and link features are used in this paper.
引用
收藏
页码:61 / 65
页数:5
相关论文
共 50 条
  • [1] An Improved Social Spammer Detection Based on Tri-training
    Xu, Guangxia
    Zhao, Jingteng
    Huang, Deling
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 4040 - 4042
  • [2] Detecting the Spam Review Using Tri-training
    Ji Chengzhang
    Kang, Dae-Ki
    2015 17TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2015, : 374 - 377
  • [3] An Improved Algorithm for Relation Extraction Based on Tri-Training
    Zhong, Zhinong
    Liu, FangChi
    Wu, Ye
    Jing, Ning
    2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE, ELECTRONICS AND ELECTRICAL ENGINEERING (ISEEE), VOLS 1-3, 2014, : 1077 - 1080
  • [4] Improved Tri-training with Unlabeled Data
    Guo, Tao
    Li, Guiyang
    SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING: THEORY AND PRACTICE, VOL 2, 2012, 115 : 139 - 147
  • [5] Improved Fake Reviews Detection Model Based on Vertical Ensemble Tri-Training and Active Learning
    Yin, Chunyong
    Cuan, Haoqi
    Zhu, Yuhang
    Yin, Zhichao
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (03)
  • [6] Semi-supervised PolSAR Classification Based on Improved Tri-training
    Hua, Wenqiang
    Wang, Shuang
    Zhao, Yang
    Yue, Bo
    Guo, Yanhe
    2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 3937 - 3940
  • [7] SEMI-SUPERVISED ACOUSTIC EVENT DETECTION BASED ON TRI-TRAINING
    Shi, Bowen
    Sun, Ming
    Kao, Chieh-Chi
    Rozgic, Viktor
    Matsoukas, Spyros
    Wang, Chao
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 750 - 754
  • [8] A Novel Semi-supervised Adaboost Technique Based on Improved Tri-training
    Li, Dunming
    Mao, Jenwen
    Shen, Fuke
    INFORMATION SECURITY AND PRIVACY, ACISP 2019, 2019, 11547 : 669 - 678
  • [9] Boosted Web Named Entity Recognition via Tri-Training
    Chou, Chien-Lung
    Chang, Chia-Hui
    Huang, Ya-Yun
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 16 (02)
  • [10] A Tri-training based Transfer Learning Algorithm
    Liu, Xiaobo
    Zhang, Harry
    Cai, Zhihua
    Wang, Guangjun
    2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, 2012, : 698 - 703