Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

被引:1
|
作者
Flores, Enrique [1 ]
Barron-Cedeno, Alberto [2 ]
Moreno, Lidia [1 ]
Rosso, Paolo [1 ]
机构
[1] Univ Politecn Valencia, E-46022 Valencia, Spain
[2] HBKU, Qatar Comp Res Inst, Doha, Qatar
关键词
Cross-Language Re-Use Detection; Source Code; Plagiarism; Latent Semantic Analysis;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text, with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.
引用
收藏
页码:1708 / 1725
页数:18
相关论文
共 50 条
  • [41] Cross-language Speech Attribute Detection and Phone Recognition for Tibetan Using Deep Learning
    Wang, Hui
    Zhao, Yue
    Xu, Yanmin
    Xu, Xiaona
    Suo, Xingmei
    Ji, Qiang
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 474 - +
  • [42] Cross-Language Taint Analysis: Generating Caller-Sensitive Native Code Specification for Java']Java
    Kan, Shuangxiang
    Gao, Yuhao
    Zhong, Zexin
    Sui, Yulei
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (06) : 1518 - 1533
  • [43] Graph-Based Similarity Analysis: A New Approach to Cross-Language Plagiarism Detection
    Franco-Salvador, Marc
    Gupta, Parth
    Rosso, Paolo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (50): : 21 - 28
  • [44] CLCD-I: Cross-Language Clone Detection by Using Deep Learning with InferCode
    Yahya, Mohammad A. A.
    Kim, Dae-Kyoo
    COMPUTERS, 2023, 12 (01)
  • [45] Cross-Language Plagiarism Detection using Word Embedding and Inverse Document Frequency (IDF)
    Aljuaid, Hanan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (02) : 232 - 237
  • [46] A study of a cross-language perception based on cortical analysis using biomimetic STRFs
    Park, Sangwook
    Han, David K.
    Elhilali, Mounya
    INTERSPEECH 2019, 2019, : 1971 - 1975
  • [47] GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench
    Alam, Ajmain I.
    Roy, Palash R.
    Al-Omari, Farouq
    Roy, Chanchal K.
    Roy, Banani
    Schneider, Kevin A.
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 1 - 13
  • [48] Using heuristics to estimate an appropriate number of latent topics in source code analysis
    Grant, Scott
    Cordy, James R.
    Skillicorn, David B.
    SCIENCE OF COMPUTER PROGRAMMING, 2013, 78 (09) : 1663 - 1678
  • [49] Providing a Source Code Security Analysis Model Using Semantic Web Techniques
    EkramiFard, Ala
    Kahani, Mohsen
    SECOND INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK 2015), 2015, : 33 - 37
  • [50] Natural Language Understanding and Multimodal Discourse Analysis for Interpreting Extremist Communications and the Re-Use of These Materials Online
    Wignell, Peter
    Chai, Kevin
    Tan, Sabine
    O'Halloran, Kay
    Lange, Rebecca
    TERRORISM AND POLITICAL VIOLENCE, 2021, 33 (01) : 71 - 95