Effect of Various Factors in Context of Feature Selection on Opinion Spam Detection

被引:2
|
作者
Rastogi, Ajay [1 ]
Mehrotra, Monica [1 ]
Ali, Syed Shafat [1 ]
机构
[1] Jamie Millia Islamia, Dept Comp Sci, New Delhi, India
关键词
feature selection; opinion spun; online reviews; classification; filter-based; model-based;
D O I
10.1109/Confluence51648.2021.9377056
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the growing popularity of online reviews, spammers often target specific products or services with the aim to mislead consumers in their purchase decisions. This has opened doors for researchers to study the problem of opinion spam detection. Till date, many effective and efficient solutions have been proposed in this regard using various types of features. However, most of the feature engineering tasks extract thousands of features, which may lead to degrade the performance and increase computation cost involved in many machine learning algorithms. Feature selection methods can greatly improve classification performance along with the reduction in computation cost of model training. In this paper, we investigate the effect of different feature selection techniques on opinion spam detection. For the same, various feature selection methods (filter-based and model-based) with varying number of features have been employed to train four different classification models. In addition, three well-known review datasets from different domains (hotel, doctor and restaurant) and four different types of features, viz., unigram, bigram, part-of-speech frequency count and word embedding, have been used to examine the impact of different factors responsible to improve the performance in opinion spam domain. Our experimental results demonstrate how different factors affect classification performance and cost, which is statistically validated by using Analysis of Variance test.
引用
收藏
页码:778 / 783
页数:6
相关论文
共 50 条
  • [41] Efficient feature selection methods in chinese spam filtering
    Xu, Yan
    Information Technology Journal, 2013, 12 (20) : 5492 - 5496
  • [42] On some feature selection strategies for spam filter design
    Wang, Ren
    Youssef, Amr M.
    Elhakeem, Ahmed K.
    2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 868 - +
  • [43] The Impact of Feature Extraction and Selection on SMS Spam Filtering
    Uysal, A. K.
    Gunal, S.
    Ergin, S.
    Gunal, E. Sora
    ELEKTRONIKA IR ELEKTROTECHNIKA, 2013, 19 (05) : 67 - 72
  • [44] Binary PSO with mutation operator for feature selection using decision tree applied to spam detection
    Zhang, Yudong
    Wang, Shuihua
    Phillips, Preetha
    Ji, Genlin
    KNOWLEDGE-BASED SYSTEMS, 2014, 64 : 22 - 31
  • [45] Feature Subset Selection Using Binary Quantum Particle Swarm Optimization for Spam Detection System
    Behjat, Amir Rajabi
    Mustapha, Aida
    Nezamabadi-Pour, Hossein
    Sulaiman, Md Nasir
    Mustapha, Norwati
    ADVANCED SCIENCE LETTERS, 2014, 20 (01) : 188 - 192
  • [46] Effective Feature Selection-Based Meta-heuristics Optimization Approach for Spam Detection
    Swetha P.
    Rao D.S.
    SN Computer Science, 4 (5)
  • [47] Spam detection through feature selection using artificial neural network and sine–cosine algorithm
    Rozita Talaei Pashiri
    Yaser Rostami
    Mohsen Mahrami
    Mathematical Sciences, 2020, 14 : 193 - 199
  • [48] The Investigation on the Effect of Feature Vector Dimension for Spam Email Detection with a New Framework
    Ergin, Semih
    Isik, Sahin
    PROCEEDINGS OF THE 2014 9TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI 2014), 2014,
  • [49] Multi-view Ensemble Learning Using Rough Set Based Feature Ranking for Opinion Spam Detection
    Saini, Mayank
    Verma, Sharad
    Sharan, Aditi
    ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, VOL 1, 2019, 759 : 3 - 12
  • [50] Email spam detection by deep learning models using novel feature selection technique and BERT
    Nasreen, Ghazala
    Khan, Muhammad Murad
    Younus, Muhammad
    Zafar, Bushra
    Hanif, Muhammad Kashif
    EGYPTIAN INFORMATICS JOURNAL, 2024, 26