Methods for cross-language plagiarism detection

被引:48
|
作者
Barron-Cedeno, Alberto [1 ,2 ]
Gupta, Parth [3 ]
Rosso, Paolo [3 ]
机构
[1] Univ Politecn Cataluna, Talp Res Ctr, E-08028 Barcelona, Spain
[2] Univ Politecn Madrid, Fac Informat, E-28040 Madrid, Spain
[3] Univ Politecn Valencia, NLE Lab ELiRF, Valencia, Spain
关键词
Automatic plagiarism detection; Cross-language plagiarism; Plagiarism detection architecture; Cross-language similarity; Text re-use analysis; RETRIEVAL;
D O I
10.1016/j.knosys.2013.06.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult material written in their native language, and (iii) people are often interested in writing in a language different to their native one. Most efforts for automatically detecting cross-language plagiarism depend on a preliminary translation, which is not always available. In this paper we propose a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing. On top of this architecture we explore the suitability of three cross-language similarity estimation models: Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Character n-Grams (CL-CNG), and Translation plus Monolingual Analysis (T + MA); three inherently different models in nature and required resources. The three models are tested extensively under the same conditions on the different plagiarism detection sub-tasks something never done before. The experiments show that T + MA produces the best results, closely followed by CL-ASA. Still CL-ASA obtains higher values of precision, an important factor in plagiarism detection when lesser user intervention is desired. Crown Copyright (C) 2013 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:211 / 217
页数:7
相关论文
共 50 条
  • [41] Cross-language comedy in Shakespeare
    Delabastita, D
    HUMOR-INTERNATIONAL JOURNAL OF HUMOR RESEARCH, 2005, 18 (02): : 161 - 184
  • [42] Safe cross-language inheritance
    Gray, Kathryn E.
    ECOOP 2008 - OBJECT-ORIENTED PROGRAMMING, PROCEEDINGS, 2008, 5142 : 52 - 75
  • [43] Cross-language information retrieval
    Nie J.-Y.
    Synthesis Lectures on Human Language Technologies, 2010, 3 (01): : 1 - 142
  • [44] Cross-Language Authorship Attribution
    Bogdanova, Dasha
    Lazaridou, Angeliki
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2015 - 2020
  • [45] The Challenge of Cross-Language Interoperability
    Chisnall, David
    COMMUNICATIONS OF THE ACM, 2013, 56 (12) : 50 - 56
  • [46] Psycholinguistics: A cross-language perspective
    Bates, E
    Devescovi, A
    Wulfeck, B
    ANNUAL REVIEW OF PSYCHOLOGY, 2001, 52 : 369 - 396
  • [47] ON CROSS-LANGUAGE IMAGE ANNOTATIONS
    Rui, Xiaoguang
    Yu, Nenghai
    Li, Mingjing
    Wu, Lei
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1608 - 1611
  • [48] Cross-language rhetorical preferences
    Taft, M.
    Huen, W.
    Chan, R.
    Kacanas, D.
    AUSTRALIAN JOURNAL OF PSYCHOLOGY, 2006, 58 : 12 - 12
  • [49] Cross-Language Retrieval with Wikipedia
    Schoenhofen, Peter
    Benczur, Andras
    Biro, Istvan
    Csalogany, Karoly
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 72 - 79
  • [50] Methods, Ethics, and Cross-Language Considerations in Research With Ethnic Minority Children
    Chatham, Rebecca E.
    Mixer, Sandra J.
    NURSING RESEARCH, 2021, 70 (05) : 383 - 390