Effect of Various Factors in Context of Feature Selection on Opinion Spam Detection

被引:2
|
作者
Rastogi, Ajay [1 ]
Mehrotra, Monica [1 ]
Ali, Syed Shafat [1 ]
机构
[1] Jamie Millia Islamia, Dept Comp Sci, New Delhi, India
关键词
feature selection; opinion spun; online reviews; classification; filter-based; model-based;
D O I
10.1109/Confluence51648.2021.9377056
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the growing popularity of online reviews, spammers often target specific products or services with the aim to mislead consumers in their purchase decisions. This has opened doors for researchers to study the problem of opinion spam detection. Till date, many effective and efficient solutions have been proposed in this regard using various types of features. However, most of the feature engineering tasks extract thousands of features, which may lead to degrade the performance and increase computation cost involved in many machine learning algorithms. Feature selection methods can greatly improve classification performance along with the reduction in computation cost of model training. In this paper, we investigate the effect of different feature selection techniques on opinion spam detection. For the same, various feature selection methods (filter-based and model-based) with varying number of features have been employed to train four different classification models. In addition, three well-known review datasets from different domains (hotel, doctor and restaurant) and four different types of features, viz., unigram, bigram, part-of-speech frequency count and word embedding, have been used to examine the impact of different factors responsible to improve the performance in opinion spam domain. Our experimental results demonstrate how different factors affect classification performance and cost, which is statistically validated by using Analysis of Variance test.
引用
收藏
页码:778 / 783
页数:6
相关论文
共 50 条
  • [21] A Genetic Programming Approach to Feature Selection and Construction for Ransomware, Phishing and Spam Detection
    Al-Sahaf, Harith
    Welch, Ian
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCCO'19 COMPANION), 2019, : 332 - 333
  • [22] Review Spam Detection Using Opinion Mining
    Narayan, Rohit
    Rout, Jitendra Kumar
    Jena, Sanjay Kumar
    PROGRESS IN INTELLIGENT COMPUTING TECHNIQUES: THEORY, PRACTICE, AND APPLICATIONS, VOL 2, 2018, 719 : 273 - 279
  • [23] Fact or Factitious? Contextualized Opinion Spam Detection
    Kennedy, Stefan
    Walsh, Niall
    Sloka, Kirils
    McCarren, Andrew
    Foster, Jennifer
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 344 - 350
  • [24] Opinion Mining using Ontological Spam Detection
    Duhan, Neelam
    Divya
    Mittal, Mamta
    2017 INTERNATIONAL CONFERENCE ON INFOCOM TECHNOLOGIES AND UNMANNED SYSTEMS (TRENDS AND FUTURE DIRECTIONS) (ICTUS), 2017, : 557 - 562
  • [25] Web spam detection with feature fusion
    Geng, Guanggang
    Zhu, Pengfei
    Wang, Deliang
    Journal of Computational Information Systems, 2009, 5 (03): : 1511 - 1519
  • [26] Feature Selection and Support Vector Machine Hyper-parameter Optimisation for Spam Detection
    Diale, Melvin
    Van der Walt, Christiaan
    Celik, Turgay
    Modupe, Abiodun
    2016 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS INTERNATIONAL CONFERENCE (PRASA-ROBMECH), 2016,
  • [27] GANs for Semi-Supervised Opinion Spam Detection
    Stanton, Gray
    Irissappane, Athirai A.
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5204 - 5210
  • [28] Deceptive opinion spam detection approaches: a literature survey
    Sushil Kumar Maurya
    Dinesh Singh
    Ashish Kumar Maurya
    Applied Intelligence, 2023, 53 : 2189 - 2234
  • [29] An ensemble approach for spam detection in Arabic opinion texts
    Saeed, Radwa M. K.
    Rady, Sherine
    Gharib, Tarek F.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (01) : 1407 - 1416
  • [30] RLOSD: Representation Learning based Opinion Spam Detection
    Sedighi, Zeinab
    Ebrahimpour-Komleh, Hossein
    Bagheri, Ayoub
    2017 3RD IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2017, : 74 - 80