Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

被引:1
|
作者
Flores, Enrique [1 ]
Barron-Cedeno, Alberto [2 ]
Moreno, Lidia [1 ]
Rosso, Paolo [1 ]
机构
[1] Univ Politecn Valencia, E-46022 Valencia, Spain
[2] HBKU, Qatar Comp Res Inst, Doha, Qatar
关键词
Cross-Language Re-Use Detection; Source Code; Plagiarism; Latent Semantic Analysis;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text, with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.
引用
收藏
页码:1708 / 1725
页数:18
相关论文
共 50 条
  • [1] On the Mono-and Cross-Language Detection of Text Re-Use and Plagiarism
    Barron Cedeno, Alberto
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (50): : 103 - 105
  • [2] Towards the Detection of Cross-Language Source Code Reuse
    Flores, Enrique
    Barron-Cedeno, Alberto
    Rosso, Paolo
    Moreno, Lidia
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 250 - 253
  • [3] Query Expansion in Cross-Language Information Retrieval Using Latent Semantic Analysis
    Bi Jianting
    Su Yidan
    ICCSE 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2008, : 220 - 224
  • [4] Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)
    Zaharia, Sergiu
    Rebedea, Traian
    Trausan-Matu, Stefan
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [5] Flowchart-Based Cross-Language Source Code Similarity Detection
    Zhang, Feng
    Li, Guofan
    Liu, Cong
    Song, Qian
    SCIENTIFIC PROGRAMMING, 2020, 2020
  • [6] Cross-language Source Code Clone Detection Based On Graph Neural Network
    Zhang, Yuguo
    Yang, Jia
    Ruan, Ou
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, NETWORK SECURITY AND COMMUNICATION TECHNOLOGY, CNSCT 2024, 2024, : 189 - 194
  • [7] Cross-Language Automatic Plagiarism Detector Using Latent Semantic Analysis and Self-Organizing Map
    Ratna, Anak Agung Putri
    Nabhastala, Paskalis Nandana Yestha
    Ibrahim, Ihsan
    Ekadiyanto, F. Astha
    Salman, Muhammad
    Herusaktiawan, Muhammad Yusuf Irfan
    Purnamasari, Prima Dewi
    AIVR 2018: 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY, 2018, : 83 - 87
  • [8] An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis
    Cosma, Georgina
    Joy, Mike
    IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (03) : 379 - 394
  • [9] TF-IDF-INSPIRED DETECTION FOR CROSS-LANGUAGE SOURCE CODE PLAGIARISM AND COLLUSION
    Karnalim, Oscar
    COMPUTER SCIENCE-AGH, 2020, 21 (01): : 113 - 136
  • [10] Cross-Language Code Similarity and Applications in Clone Detection and Code Search
    Mathew, George Varghese
    ProQuest Dissertations and Theses Global, 2022,