Towards Web Spam Filtering using a Classifier based on the Minimum Description Length Principle

被引:0
|
作者
Silva, Renato M. [1 ]
Yamakami, Akebo [1 ]
Almeida, Tiago A. [2 ]
机构
[1] Univ Campinas UNICAMP, Sch Elect & Comp Engn, Sao Paulo, Brazil
[2] Fed Univ Sao Carlos UFSCar, Dept Comp Sci, Sao Paulo, Brazil
来源
2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016) | 2016年
基金
巴西圣保罗研究基金会;
关键词
D O I
10.1109/ICMLA.2016.170
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The steady growth and popularization of the Web has led spammers to develop techniques to circumvent search engines aiming good visibility to their web pages in search results. They are responsible for serious problems such as dissatisfaction, irritation, exposure to unpleasant or malicious content, and financial loss. Despite different machine learning approaches have been used to detect web spam, many of them suffer with the curse of dimensionality or require a very high computational cost impeding their employment in real scenarios. In this way, there is still a big effort to develop more advanced methods that at the same time are able to prevent overfitting and fast to learn. To fill this gap, we present the MDLClass, a classifier technique based on the minimum description length principle, applied to the context of web spam filtering. The proposed method is very efficient, lightweight, multi-class, and fast. We also evaluated a new approach to detect web spam that combines the predictions obtained by the classifiers using content-based, link-based, and transformed link-based features. In our experiments, we employed two real, public and large datasets: the WEBSPAM-UK2006 and the WEBSPAM-UK2007. The results indicate that the proposed MDLClass and ensemble of predictions using different types of features are promising in the task of web spam filtering.
引用
收藏
页码:470 / 475
页数:6
相关论文
共 50 条
  • [31] Kona: A multi-junction detector using minimum description length principle
    Parida, L
    Geiger, D
    Hummel, R
    ENERGY MINIMIZATION METHODS IN COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, 1223 : 51 - 65
  • [32] Selection of an Optimal Polyhedral Surface Model Using the Minimum Description Length Principle
    Wekel, Tilman
    Hellwich, Olaf
    PATTERN RECOGNITION, 2010, 6376 : 553 - 562
  • [33] Using the minimum description length principle to infer reduced ordered decision graphs
    Oliveira, AL
    SangiovanniVincentelli, A
    MACHINE LEARNING, 1996, 25 (01) : 23 - 50
  • [34] Segmented Model Selection in Quantile Regression Using the Minimum Description Length Principle
    Aue, Alexander
    Cheung, Rex C. Y.
    Lee, Thomas C. M.
    Zhong, Ming
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (507) : 1241 - 1256
  • [35] Finding haplotype block boundaries by using the minimum-description-length principle
    Anderson, EC
    Novembre, J
    AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 73 (02) : 336 - 354
  • [36] Adaptive ripple down rules method based on minimum description length principle
    Yoshida, T
    Wada, T
    Motoda, H
    Washio, T
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 530 - 537
  • [37] Learning Conditional Preference Networks: An Approach Based on the Minimum Description Length Principle
    Gimenezi, Pierre-Francois
    Mengin, Jerome
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 3395 - 3403
  • [38] A Machine Learning based Web Spam Filtering Approach
    Kumar, Santosh
    Gao, Xiaoying
    Welch, Ian
    Mansoori, Masood
    IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS IEEE AINA 2016, 2016, : 973 - 980
  • [39] An Intelligent Spam Email Filtering Approach Using a Learning Classifier System
    Al-Ajeli, Ahmed
    Al-Shamery, Eman S.
    Alubady, Raaid
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2022, 22 (03) : 233 - 244
  • [40] MVGL Analyser for Multi-classifier Based Spam Filtering System
    Islam, Md Rafiqul
    Zhou, Wanlei
    Chowdhury, Morshed U.
    PROCEEDINGS OF THE 8TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, 2009, : 394 - 399