Towards Web Spam Filtering using a Classifier based on the Minimum Description Length Principle

被引:0
|
作者
Silva, Renato M. [1 ]
Yamakami, Akebo [1 ]
Almeida, Tiago A. [2 ]
机构
[1] Univ Campinas UNICAMP, Sch Elect & Comp Engn, Sao Paulo, Brazil
[2] Fed Univ Sao Carlos UFSCar, Dept Comp Sci, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
D O I
10.1109/ICMLA.2016.170
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The steady growth and popularization of the Web has led spammers to develop techniques to circumvent search engines aiming good visibility to their web pages in search results. They are responsible for serious problems such as dissatisfaction, irritation, exposure to unpleasant or malicious content, and financial loss. Despite different machine learning approaches have been used to detect web spam, many of them suffer with the curse of dimensionality or require a very high computational cost impeding their employment in real scenarios. In this way, there is still a big effort to develop more advanced methods that at the same time are able to prevent overfitting and fast to learn. To fill this gap, we present the MDLClass, a classifier technique based on the minimum description length principle, applied to the context of web spam filtering. The proposed method is very efficient, lightweight, multi-class, and fast. We also evaluated a new approach to detect web spam that combines the predictions obtained by the classifiers using content-based, link-based, and transformed link-based features. In our experiments, we employed two real, public and large datasets: the WEBSPAM-UK2006 and the WEBSPAM-UK2007. The results indicate that the proposed MDLClass and ensemble of predictions using different types of features are promising in the task of web spam filtering.
引用
收藏
页码:470 / 475
页数:6
相关论文
共 50 条
  • [21] Spam Filtering System Based on Rough Set and Bayesian Classifier
    Wang, Yun
    Wu, Zhiqiang
    Wu, Runxiu
    2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 624 - +
  • [22] Minimum description length principle applied to camouflage assessment
    Ruppert, GS
    Wimmer, A
    Bischof, H
    Gretzmacher, FM
    Wendner, G
    TARGETS AND BACKGROUNDS VII: CHARACTERIZATION AND REPRESENTATION, 2001, 4370 : 50 - 59
  • [23] Network Reconstruction via the Minimum Description Length Principle
    Peixoto, Tiago P.
    PHYSICAL REVIEW X, 2025, 15 (01):
  • [24] The minimum description length principle for modeling recording channels
    Kavcic, A
    Srinivasan, M
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2001, 19 (04) : 719 - 729
  • [25] Minimum Description Length Principle for Compositional Model Learning
    Jirousek, Radim
    Krejcova, Iva
    INTEGRATED UNCERTAINTY IN KNOWLEDGE MODELLING AND DECISION MAKING, IUKM 2015, 2015, 9376 : 254 - 266
  • [26] Optimizing Hierarchical Visualizations with the Minimum Description Length Principle
    Veras, Rafael
    Collins, Christopher
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2017, 23 (01) : 631 - 640
  • [27] The minimum description length principle and model selection in spectropolarimetry
    Ramos, A. Asensio
    ASTROPHYSICAL JOURNAL, 2006, 646 (02): : 1445 - 1451
  • [28] The minimum description length principle for pattern mining: a survey
    Galbrun, Esther
    DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (05) : 1679 - 1727
  • [29] The minimum description length principle for pattern mining: a survey
    Esther Galbrun
    Data Mining and Knowledge Discovery, 2022, 36 : 1679 - 1727
  • [30] Optimal work extraction and the minimum description length principle
    Touzo, Leo
    Marsili, Matteo
    Merhav, Neri
    Roldan, Edgar
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2020, 2020 (09):