Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

被引:1
|
作者
Flores, Enrique [1 ]
Barron-Cedeno, Alberto [2 ]
Moreno, Lidia [1 ]
Rosso, Paolo [1 ]
机构
[1] Univ Politecn Valencia, E-46022 Valencia, Spain
[2] HBKU, Qatar Comp Res Inst, Doha, Qatar
关键词
Cross-Language Re-Use Detection; Source Code; Plagiarism; Latent Semantic Analysis;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text, with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.
引用
收藏
页码:1708 / 1725
页数:18
相关论文
共 50 条
  • [31] A systematic study of knowledge graph analysis for cross-language plagiarism detection
    Franco-Salvador, Marc
    Rosso, Paolo
    Montes-y-Gomez, Manuel
    INFORMATION PROCESSING & MANAGEMENT, 2016, 52 (04) : 550 - 570
  • [32] DeleSmell: Code smell detection based on deep learning and latent semantic analysis
    Zhang, Yang
    Ge, Chuyan
    Hong, Shuai
    Tian, Ruili
    Dong, Chunhao
    Liu, Jingjing
    KNOWLEDGE-BASED SYSTEMS, 2022, 255
  • [33] Recovering documentation-to-source-code traceability links using latent semantic indexing
    Marcus, A
    Maletic, JI
    25TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2003, : 125 - 135
  • [34] Language model adaptation in Tamil language using cross-lingual latent semantic analysis with document aligned corpora
    Selvam, M.
    Natarajan, A. M.
    CURRENT SCIENCE, 2010, 98 (07): : 922 - 929
  • [35] Improve Representation for Cross-Language Clone Detection by Pretrain Using Tree Autoencoder
    Ling, Huading
    Zhang, Aiping
    Yin, Changchun
    Li, Dafang
    Chang, Mengyu
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 33 (03): : 1561 - 1577
  • [36] Parkinson's Disease Detection Method Based on Cross-Language Acoustic Analysis
    Ji W.
    Wang C.
    Wu D.
    Li Y.
    Zheng H.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (02): : 546 - 554
  • [37] Analysis and Re-Use of Videos in Educational Digital Libraries with Automatic Scene Detection
    Baraldi, Lorenzo
    Grana, Costantino
    Cucchiara, Rita
    DIGITAL LIBRARIES ON THE MOVE, IRCDL 2015, 2016, 612 : 155 - 164
  • [38] Experiments on the Indonesian Plagiarism Detection using Latent Semantic Analysis
    Soleman, Sidik
    Purwarianti, Ayu
    2014 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2014,
  • [39] Signature Based Intrusion Detection using Latent Semantic Analysis
    Lassez, Jean-Louis
    Rossi, Ryan
    Sheel, Stephen
    Mukkamala, Srinivas
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1068 - 1074
  • [40] Use of semantic analysis latent in an attempt to optimize the acquisition by exposure to a foreign language
    Zampa, Virginie
    ALSIC-APPRENTISSAGE DES LANGUES ET SYSTEMS D INFORMATION ET DE COMMUNICATION, 2005, 8 (02): : 135 - 146