Using of n-grams from morphological tags for fake news classification

被引:0
|
作者
Kapusta J. [1 ]
Drlik M. [1 ]
Munk M. [1 ,2 ]
机构
[1] Department of Informatics, Constantine the Philosopher University in Nitra, Nitra
[2] Science and Research Centre, University of Pardubice, Pardubice
关键词
Computational Linguistics; Data Mining and Machine Learning; Fake news identification; Morphological analysis; Natural Language and Speech; Natural language processing; POS tagging; Text mining;
D O I
10.7717/PEERJ-CS.624
中图分类号
学科分类号
摘要
Research of the techniques for effective fake news detection has become very needed and attractive. These techniques have a background in many research disciplines, including morphological analysis. Several researchers stated that simple content-related n-grams and POS tagging had been proven insufficient for fake news classification. However, they did not realise any empirical research results, which could confirm these statements experimentally in the last decade. Considering this contradiction, the main aim of the paper is to experimentally evaluate the potential of the common use of n-grams and POS tags for the correct classification of fake and true news. The dataset of published fake or real news about the current Covid-19 pandemic was pre-processed using morphological analysis. As a result, n-grams of POS tags were prepared and further analysed. Three techniques based on POS tags were proposed and applied to different groups of n-grams in the pre-processing phase of fake news detection. The n-gram size was examined as the first. Subsequently, the most suitable depth of the decision trees for sufficient generalization was scoped. Finally, the performance measures of models based on the proposed techniques were compared with the standardised reference TF-IDF technique. The performance measures of the model like accuracy, precision, recall and f1-score are considered, together with the 10-fold cross-validation technique. Simultaneously, the question, whether the TF-IDF technique can be improved using POS tags was researched in detail. The results showed that the newly proposed techniques are comparable with the traditional TF-IDF technique. At the same time, it can be stated that the morphological analysis can improve the baseline TF-IDF technique. As a result, the performance measures of the model, precision for fake news and recall for real news, were statistically significantly improved. © 2021 Kapusta et al. All Rights Reserved.
引用
收藏
页码:1 / 27
页数:26
相关论文
共 50 条
  • [31] Reconstructing Textual Documents from n-grams
    Galle, Matthias
    Tealdi, Matias
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 329 - 338
  • [32] Breast and Lung Anticancer Peptides Classification Using N-Grams and Ensemble Learning Techniques
    Abbas, Ayad Rodhan
    Mahdi, Bashar Saadoon
    Fadhil, Osamah Younus
    BIG DATA AND COGNITIVE COMPUTING, 2022, 6 (02)
  • [33] Identical N-grams benefit more than reversed and switched N-grams in a flanker task: evidence from Spanish
    Lazaro, Miguel
    Correa, M. Angeles
    Garcia, Lorena
    LANGUAGE COGNITION AND NEUROSCIENCE, 2025,
  • [34] Identifying Similar Sentences by Using N-Grams of Characters
    Sultana, Saima
    Biskri, Ismail
    RECENT TRENDS AND FUTURE TECHNOLOGY IN APPLIED INTELLIGENCE, IEA/AIE 2018, 2018, 10868 : 833 - 843
  • [35] Clone Detection for Ecore Metamodels using N-grams
    Babur, Onder
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON MODEL-DRIVEN ENGINEERING AND SOFTWARE DEVELOPMENT, 2018, : 411 - 419
  • [36] Document Verification Using n-grams and Histograms of Words
    Almarimi, Abdulwahed
    Andrejkova, Gabriela
    Sedmak, Peter
    2015 IEEE 13TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATICS, 2015, : 15 - 20
  • [37] USING N-GRAMS TO IDENTIFY EDIT WARS ON WIKIPEDIA
    Ghosh, Arjun
    2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2019), 2019, : 398 - 403
  • [38] Using n-grams of spatial densities to construct maps
    Maffei, Renan
    Jorge, Vitor A. M.
    Rey, Vitor E.
    Franco, Guilherme S.
    Giambastiani, Mariane
    Barbosa, Jessica
    Kolberg, Mariana
    Prestes, Edson
    2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 3850 - 3855
  • [39] Source code authorship attribution using n-grams
    Burrows, Steven
    Tahaghoghi, S.M.M.
    ADCS 2007 - Proceedings of the Twelfth Australasian Document Computing Symposium, 2007, : 32 - 39
  • [40] Using n-grams for the Automated Clustering of Structural Models
    Babur, Onder
    Cleophas, Loek
    SOFSEM 2017: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2017, 10139 : 510 - 524