Methods for cross-language plagiarism detection

被引:48
|
作者
Barron-Cedeno, Alberto [1 ,2 ]
Gupta, Parth [3 ]
Rosso, Paolo [3 ]
机构
[1] Univ Politecn Cataluna, Talp Res Ctr, E-08028 Barcelona, Spain
[2] Univ Politecn Madrid, Fac Informat, E-28040 Madrid, Spain
[3] Univ Politecn Valencia, NLE Lab ELiRF, Valencia, Spain
关键词
Automatic plagiarism detection; Cross-language plagiarism; Plagiarism detection architecture; Cross-language similarity; Text re-use analysis; RETRIEVAL;
D O I
10.1016/j.knosys.2013.06.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult material written in their native language, and (iii) people are often interested in writing in a language different to their native one. Most efforts for automatically detecting cross-language plagiarism depend on a preliminary translation, which is not always available. In this paper we propose a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing. On top of this architecture we explore the suitability of three cross-language similarity estimation models: Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Character n-Grams (CL-CNG), and Translation plus Monolingual Analysis (T + MA); three inherently different models in nature and required resources. The three models are tested extensively under the same conditions on the different plagiarism detection sub-tasks something never done before. The experiments show that T + MA produces the best results, closely followed by CL-ASA. Still CL-ASA obtains higher values of precision, an important factor in plagiarism detection when lesser user intervention is desired. Crown Copyright (C) 2013 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:211 / 217
页数:7
相关论文
共 50 条
  • [31] Cross-Language Experiment
    Stastny, Jakub
    Sovka, Pavel
    RADIOENGINEERING, 2003, 12 (03) : 37 - 41
  • [32] CROSS-LANGUAGE PSYCHOLINGUISTICS
    CUTLER, A
    LINGUISTICS, 1985, 23 (05) : 659 - 667
  • [33] Smoothing Methods and Cross-Language Document Re-ranking
    Zhou, Dong
    Wade, Vincent
    MULTILINGUAL INFORMATION ACCESS EVALUATION I: TEXT RETRIEVAL EXPERIMENTS, 2010, 6241 : 62 - 69
  • [34] A Study on Cross-Language Text Summarization Using Supervised Methods
    Yu, Lei
    Ren, Fuji
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 586 - 592
  • [35] Language and cognition: A cross-language perspective
    Chen, HC
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2004, 39 (5-6) : 148 - 148
  • [36] On the Feasibility of Cross-Language Detection of Malicious Packages in npm and PyPI
    Ladisa, Piergiorgio
    Ponta, Serena Elisa
    Ronzoni, Nicola
    Martinez, Matias
    Barais, Olivier
    39TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2023, 2023, : 71 - 82
  • [37] Dynamic stacking ensemble for cross-language code smell detection
    Aljamaan, Hamoud
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [38] A cross-language speech model for detection of Parkinson's disease
    Lim, Wee Shin
    Chiu, Shu-, I
    Peng, Pei-Ling
    Jang, Jyh-Shing Roger
    Lee, Sol-Hee
    Lin, Chin-Hsien
    Kim, Han-Joon
    JOURNAL OF NEURAL TRANSMISSION, 2025, 132 (04) : 579 - 590
  • [39] Dynamic stacking ensemble for cross-language code smell detection
    Aljamaan, Hamoud
    PeerJ Computer Science, 2024, 10
  • [40] Cross-Language Automatic Plagiarism Detector Using Latent Semantic Analysis and Self-Organizing Map
    Ratna, Anak Agung Putri
    Nabhastala, Paskalis Nandana Yestha
    Ibrahim, Ihsan
    Ekadiyanto, F. Astha
    Salman, Muhammad
    Herusaktiawan, Muhammad Yusuf Irfan
    Purnamasari, Prima Dewi
    AIVR 2018: 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY, 2018, : 83 - 87