Exploring the Boundaries Between LLM Code Clone Detection and Code Similarity Assessment on Human and AI-Generated Code

被引:0
|
作者
Zhang, Zixian [1 ]
Saber, Takfarinas [2 ]
机构
[1] Univ Galway, Sch Comp Sci, CRT AI, Galway H91TK33, Ireland
[2] Univ Galway, Sch Comp Sci, Lero, Galway H91 TK33, Ireland
基金
爱尔兰科学基金会;
关键词
code clone detection; code similarity; large language model; fine-tuning; LLM-generated code;
D O I
10.3390/bdcc9020041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As Large Language Models (LLMs) continue to advance, their capabilities in code clone detection have garnered significant attention. While much research has assessed LLM performance on human-generated code, the proliferation of LLM-generated code raises critical questions about their ability to detect clones across both human- and LLM-created codebases, as this capability remains largely unexplored. This paper addresses this gap by evaluating two versions of LLaMA3 on these distinct types of datasets. Additionally, we perform a deeper analysis beyond simple prompting, examining the nuanced relationship between code cloning and code similarity that LLMs infer. We further explore how fine-tuning impacts LLM performance in clone detection, offering new insights into the interplay between code clones and similarity in human versus AI-generated code. Our findings reveal that LLaMA models excel in detecting syntactic clones but face challenges with semantic clones. Notably, the models perform better on LLM-generated datasets for semantic clones, suggesting a potential bias. The fine-tuning technique enhances the ability of LLMs to comprehend code semantics, improving their performance in both code clone detection and code similarity assessment. Our results offer valuable insights into the effectiveness and characteristics of LLMs in clone detection and code similarity assessment, providing a foundation for future applications and guiding further research in this area.
引用
收藏
页数:19
相关论文
共 29 条
  • [1] DeVAIC: : A tool for security assessment of AI-generated code
    Cotroneo, Domenico
    De Luca, Roberta
    Liguori, Pietro
    INFORMATION AND SOFTWARE TECHNOLOGY, 2025, 177
  • [2] AI-Generated Code Not Considered Harmful
    Kendon, Tyson
    Wu, Leanne
    Aycock, John
    PROCEEDINGS OF THE 25TH WESTERN CANADIAN CONFERENCE ON COMPUTING EDUCATION, 2023,
  • [3] Navigating (in)security of AI-generated code
    Ambati, Sri Haritha
    Ridley, Norah
    Branca, Enrico
    Stakhanova, Natalia
    2024 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2024, : 30 - 37
  • [4] Automating the correctness assessment of AI-generated code for security contexts
    Cotroneo, Domenico
    Foggia, Alessio
    Improta, Cristina
    Liguori, Pietro
    Natella, Roberto
    JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 216
  • [5] Validating AI-Generated Code with Live Programming
    Ferdowsi, Kasra
    Huang, Ruanqianqian
    James, Michael B.
    Polikarpova, Nadia
    Lerner, Sorin
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
  • [6] EX-CODE: A Robust and Explainable Model to Detect AI-Generated Code
    Bulla, Luana
    Midolo, Alessandro
    Mongiovi, Misael
    Tramontana, Emiliano
    INFORMATION, 2024, 15 (12)
  • [7] Poisoning Programs by Un-Repairing Code: Security Concerns of AI-generated Code
    Improta, Cristina
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS, ISSREW, 2023, : 128 - 131
  • [8] Creating Thorough Tests for AI-Generated Code is Hard
    Singhal, Shreya
    Kumar, Viraj
    PROCEEDINGS OF THE 16TH ANNUAL ACM INDIA COMPUTE CONFERENCE, COMPUTE 2023, 2023, : 108 - 111
  • [9] A Quantitative Analysis of Quality and Consistency in AI-generated Code
    Clark, Autumn
    Igbokwe, Daniel
    Ross, Samantha
    Zibran, Minhaz F.
    2024 7TH INTERNATIONAL CONFERENCE ON SOFTWARE AND SYSTEM ENGINEERING, ICOSSE 2024, 2024, : 37 - 41
  • [10] A Comparative Analysis between AI Generated Code and Human Written Code: A Preliminary Study
    Patel, Abhi
    Sultana, Kazi Zakia
    Samanthula, Bharath K.
    Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, 2024, : 7521 - 7529