Methods for cross-language plagiarism detection

被引:48
|
作者
Barron-Cedeno, Alberto [1 ,2 ]
Gupta, Parth [3 ]
Rosso, Paolo [3 ]
机构
[1] Univ Politecn Cataluna, Talp Res Ctr, E-08028 Barcelona, Spain
[2] Univ Politecn Madrid, Fac Informat, E-28040 Madrid, Spain
[3] Univ Politecn Valencia, NLE Lab ELiRF, Valencia, Spain
关键词
Automatic plagiarism detection; Cross-language plagiarism; Plagiarism detection architecture; Cross-language similarity; Text re-use analysis; RETRIEVAL;
D O I
10.1016/j.knosys.2013.06.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult material written in their native language, and (iii) people are often interested in writing in a language different to their native one. Most efforts for automatically detecting cross-language plagiarism depend on a preliminary translation, which is not always available. In this paper we propose a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing. On top of this architecture we explore the suitability of three cross-language similarity estimation models: Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Character n-Grams (CL-CNG), and Translation plus Monolingual Analysis (T + MA); three inherently different models in nature and required resources. The three models are tested extensively under the same conditions on the different plagiarism detection sub-tasks something never done before. The experiments show that T + MA produces the best results, closely followed by CL-ASA. Still CL-ASA obtains higher values of precision, an important factor in plagiarism detection when lesser user intervention is desired. Crown Copyright (C) 2013 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:211 / 217
页数:7
相关论文
共 50 条
  • [21] Neural Methods for Cross-Language Information Retrieval
    Yang, Eugene
    Lawrie, Dawn
    Mayfield, James
    Nair, Suraj
    Oard, Douglas W.
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3430 - 3431
  • [22] An effective approach to candidate retrieval for cross-language plagiarism detection: A fusion of conceptual and keyword-based schemes
    Roostaee, Meysam
    Sadreddini, Mohammad Hadi
    Fakhrahmad, Seyed Mostafa
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [23] Structural and Nominal Cross-Language Clone Detection
    Nichols, Lawton
    Emre, Mehmet
    Hardekopf, Ben
    FUNDAMENTAL APPROACHES TO SOFTWARE ENGINEERING (FASE 2019), 2019, 11424 : 247 - 263
  • [24] LICCA: A Tool for Cross-Language Clone Detection
    Vislayski, Tijana
    Rakic, Gordana
    Cardozo, Nicolas
    Budimac, Zoran
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2018), 2018, : 512 - 516
  • [25] On the use of word embedding for cross language plagiarism detection
    Asghari, Habibollah
    Fatemi, Omid
    Mohtaj, Salar
    Faili, Heshaam
    Rosso, Paolo
    INTELLIGENT DATA ANALYSIS, 2019, 23 (03) : 661 - 680
  • [26] A Cross Language Plagiarism Detection Based on Cloud Computing
    Fan, Chih-Tien
    Nguyen Dang Minh
    Muhammad, Husaini
    INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014), 2015, 274 : 2090 - 2099
  • [27] Semantic Similarity/Relatedness for Cross language plagiarism detection
    Ezzikouri, Hanane
    Oukessou, Mohamed
    Erritali, Mohammed
    2016 13TH INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS, IMAGING AND VISUALIZATION (CGIV), 2016, : 372 - 374
  • [28] Mispronunciation detection based on cross-language phonological comparisons
    Wang, Lan
    Feng, Xin
    Meng, Helen M.
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 307 - 311
  • [29] Cross-language information propagation for arabic mention detection
    Zitouni, Imed
    Florian, Radu
    ACM Transactions on Asian Language Information Processing, 2009, 8 (04):
  • [30] Towards the Detection of Cross-Language Source Code Reuse
    Flores, Enrique
    Barron-Cedeno, Alberto
    Rosso, Paolo
    Moreno, Lidia
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 250 - 253