Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons, analysis and challenges

被引:30
|
作者
Vani, K. [1 ]
Gupta, Deepa [2 ]
机构
[1] Amrita Univ, Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Dept Comp Sci & Engn, Bengaluru, India
[2] Amrita Univ, Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Dept Math, Bengaluru, India
关键词
Natural language processing; Plagiarism detection; Syntactic-semantic; POS tagging; Chunking; Semantic role labelling;
D O I
10.1016/j.ipm.2018.01.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The proposed work aims to explore and compare the potency of syntactic-semantic based linguistic structures in plagiarism detection using natural language processing techniques. The current work explores linguistic features, viz., part of speech tags, chunks and semantic roles in detecting plagiarized fragments and utilizes a combined syntactic-semantic similarity metric, which extracts the semantic concepts from WordNet lexical database. The linguistic information is utilized for effective pre-processing and for availing semantically relevant comparisons. Another major contribution is the analysis of the proposed approach on plagiarism cases of various complexity levels. The impact of plagiarism types and complexity levels, upon the features extracted is analyzed and discussed. Further, unlike the existing systems, which were evaluated on some limited data sets, the proposed approach is evaluated on a larger scale using the plagiarism corpus provided by PAN(1) competition from 2009 to 2014. The approach presented considerable improvement in comparison with the top-ranked systems of the respective years. The evaluation and analysis with various cases of plagiarism also reflected the supremacy of deeper linguistic features for identifying manually plagiarized data.
引用
收藏
页码:408 / 432
页数:25
相关论文
共 50 条
  • [21] TEXT CLASSIFICATION AND CLUSTER ANALYSIS BASED ON DEEP LEARNING AND NATURAL LANGUAGE PROCESSING
    Huang, Hua
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (03): : 1826 - 1832
  • [22] Text mining and analysis of treatise on febrile diseases based on natural language processing
    Zhao, Kai
    Shi, Na
    Sa, Zhen
    Wang, Hua-Xing
    Lu, Chun-Hua
    Xu, Xiao-Ying
    WORLD JOURNAL OF TRADITIONAL CHINESE MEDICINE, 2020, 6 (01) : 67 - 73
  • [23] TEXT CLASSIFICATION AND CLUSTER ANALYSIS BASED ON DEEP LEARNING AND NATURAL LANGUAGE PROCESSING
    HUANG H.U.A.
    Scalable Computing, 2024, 25 (03): : 1826 - 1832
  • [24] A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies
    Sehrish Iqbal
    Saeed-Ul Hassan
    Naif Radi Aljohani
    Salem Alelyani
    Raheel Nawaz
    Lutz Bornmann
    Scientometrics, 2021, 126 : 6551 - 6599
  • [25] A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies
    Iqbal, Sehrish
    Hassan, Saeed-Ul
    Aljohani, Naif Radi
    Alelyani, Salem
    Nawaz, Raheel
    Bornmann, Lutz
    SCIENTOMETRICS, 2021, 126 (08) : 6551 - 6599
  • [26] Web Document Text and Images Extraction using DOM Analysis and Natural Language Processing
    Joshi, Parag Mulendra
    Liu, Sam
    DOCENG'09: PROCEEDINGS OF THE 2009 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2009, : 218 - 221
  • [27] Depression Detection from Social Media Text Analysis using Natural Language Processing Techniques and Hybrid Deep Learning Model
    Tejaswini, Vankayala
    Babu, Korra Sathya
    Sahoo, Bibhudatta
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
  • [28] Examining causes of disputes in subcontracting litigation cases using text mining and natural language processing techniques
    Ye, Yun-Xia
    Shan, Ming
    Gao, Xin
    Li, Qin
    Zhang, Hua
    INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT, 2024, 24 (15) : 1617 - 1629
  • [29] A Natural Language Processing Based Trend Analysis of Advanced Persistent Threat Techniques
    Niakanlahiji, Amirreza
    Wei, Jinpeng
    Chu, Bei-Tseng
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2995 - 3000
  • [30] Sentiment analysis of Japanese text and vocabulary learning based on natural language processing and SVM
    Song, Gang
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021,