Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons, analysis and challenges

被引:30
|
作者
Vani, K. [1 ]
Gupta, Deepa [2 ]
机构
[1] Amrita Univ, Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Dept Comp Sci & Engn, Bengaluru, India
[2] Amrita Univ, Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Dept Math, Bengaluru, India
关键词
Natural language processing; Plagiarism detection; Syntactic-semantic; POS tagging; Chunking; Semantic role labelling;
D O I
10.1016/j.ipm.2018.01.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The proposed work aims to explore and compare the potency of syntactic-semantic based linguistic structures in plagiarism detection using natural language processing techniques. The current work explores linguistic features, viz., part of speech tags, chunks and semantic roles in detecting plagiarized fragments and utilizes a combined syntactic-semantic similarity metric, which extracts the semantic concepts from WordNet lexical database. The linguistic information is utilized for effective pre-processing and for availing semantically relevant comparisons. Another major contribution is the analysis of the proposed approach on plagiarism cases of various complexity levels. The impact of plagiarism types and complexity levels, upon the features extracted is analyzed and discussed. Further, unlike the existing systems, which were evaluated on some limited data sets, the proposed approach is evaluated on a larger scale using the plagiarism corpus provided by PAN(1) competition from 2009 to 2014. The approach presented considerable improvement in comparison with the top-ranked systems of the respective years. The evaluation and analysis with various cases of plagiarism also reflected the supremacy of deeper linguistic features for identifying manually plagiarized data.
引用
收藏
页码:408 / 432
页数:25
相关论文
共 50 条
  • [1] Using a Natural Language Processing Tool to Assist the Collection of Samples for the Study of Syntactic-Semantic Properties of Verbs
    Picoli, Larissa
    Pirovani, Juliana Campos
    de Oliveira, Elias
    Laporte, Eric
    LINGUAMATICA, 2015, 7 (02): : 35 - 44
  • [2] Using Natural Language Processing Techniques and Fuzzy-Semantic Similarity for Automatic External Plagiarism Detection
    Gupta, Deepa
    Vani, K.
    Singh, Charan Kamal
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2694 - 2699
  • [3] SYNTACTIC-SEMANTIC ANALYSIS OF NATURAL-LANGUAGE BY A NEW WORLD-CLASS CONTROLLED FUNCTIONAL-ANALYSIS (WCFA)
    HELBIG, H
    COMPUTERS AND ARTIFICIAL INTELLIGENCE, 1986, 5 (01): : 53 - 59
  • [4] Construction site accident analysis using text mining and natural language processing techniques
    Zhang, Fan
    Fleyeh, Hasan
    Wang, Xinru
    Lu, Minghui
    AUTOMATION IN CONSTRUCTION, 2019, 99 : 238 - 248
  • [5] Semantic Web service discovery using natural language processing techniques
    Sangers, Jordy
    Frasincar, Flavius
    Hogenboom, Frederik
    Chepegin, Vadim
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) : 4660 - 4671
  • [6] A Study of Improvement Strategies for Semantic Analysis Techniques in Natural Language Processing
    Wang, Pingping
    Ma, Zeliang
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [7] Plagiarism Detection System for Indonesia Text Based Document by Fingerprint Method and Natural Language Processing Approach
    Winarti, Titin
    Kerami, Djati
    Etp, Lussiana
    Sekarwati, Kemal Ade
    ADVANCED SCIENCE LETTERS, 2016, 22 (10) : 3128 - 3131
  • [8] Systematic analysis of constellation-based techniques by using Natural Language Processing
    Perazzoli, Simone
    de Santana Neto, Jose Pedro
    Mathias Barreto de Menezes, Milton Jose
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2022, 179
  • [9] An Optimized English Text Watermarking Method Based on Natural Language Processing Techniques
    Al-Wesabi, Fahd N.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (02): : 1519 - 1536
  • [10] Analysis of Stock Market using Text Mining and Natural Language Processing
    Abdullah, Sheikh Shaugat
    Rahaman, Mohammad Saiedur
    Rahman, Mohammad Saidur
    2013 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2013,