Detecting Singleton Review Spammers Using Semantic Similarity

被引:50
|
作者
Sandulescu, Vlad [1 ,3 ]
Ester, Martin [2 ]
机构
[1] Adform, Copenhagen, Denmark
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada
[3] Trustpilot, Copenhagen, Denmark
关键词
opinion spam; fake review detection; semantic similarity; aspect-based opinion mining; latent dirichlet allocation;
D O I
10.1145/2740908.2742570
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Online reviews have increasingly become a very important resource for consumers when making purchases. Though it is becoming more and more difficult for people to make well-informed buying decisions without being deceived by fake reviews. Prior works on the opinion spam problem mostly considered classifying fake reviews using behavioral user patterns. They focused on prolific users who write more than a couple of reviews, discarding one-time reviewers. The number of singleton reviewers however is expected to be high for many review websites. While behavioral patterns are effective when dealing with elite users, for one-time reviewers, the review text needs to be exploited. In this paper we tackle the problem of detecting fake reviews written by the same person using multiple names, posting each review under a different name. We propose two methods to detect similar reviews and show the results generally outperform the vectorial similarity measures used in prior works. The first method extends the semantic similarity between words to the reviews level. The second method is based on topic modeling and exploits the similarity of the reviews topic distributions using two models: bag-of-words and bag-of-opinion phrases. The experiments were conducted on reviews from three different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset (800 reviews).
引用
收藏
页码:971 / 976
页数:6
相关论文
共 50 条
  • [41] Semantic text similarity using corpus-based word similarity and string similarity
    University of Ottawa
    不详
    ACM Transactions on Knowledge Discovery from Data, 2008, 2 (02)
  • [42] A comprehensive review of stacking methods for semantic similarity measurement
    Martinez-Gil, Jorge
    MACHINE LEARNING WITH APPLICATIONS, 2022, 10
  • [43] Semantic Document Clustering Using a Similarity Graph
    Stanchev, Lubomir
    2016 IEEE TENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2016, : 1 - 8
  • [44] Question Similarity Detection in Turkish Using Semantic Textual Similarity Methods
    Yildiz, Eray
    Findik, Yasin
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [45] Efficient Textual Similarity using Semantic MinHashing
    Nawaz, Waqas
    Baig, Maryam
    Khan, Kifayat Ullah
    2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 262 - 269
  • [46] Hypertext construction using statistical and semantic similarity
    Shin, D
    Nam, S
    Kim, M
    ACM DIGITAL LIBRARIES '97, 1997, : 57 - 63
  • [47] Sentence Semantic Similarity Using Dependency Parsing
    Vakare, Tanmay
    Verma, Kshitij
    Jain, Vedant
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [48] Effective semantic search using thematic similarity
    Khan, Sharifullah
    Mustafa, Jibran
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2014, 26 (02) : 161 - 169
  • [49] Ranking of Web Documents using Semantic Similarity
    Chahal, Poonam
    Singh, Manjeet
    Kumar, Suresh
    PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND COMPUTER NETWORKS (ISCON), 2013, : 145 - 150
  • [50] An insight into semantic similarity aspects using WordNet
    Sharan A.
    Joshi M.L.
    International Journal of Information and Communication Technology, 2010, 2 (04) : 331 - 341