Scalable and Systematic Detection of Buggy Inconsistencies in Source Code

被引:30
|
作者
Gabel, Mark [1 ]
Yang, Junfeng [2 ]
Yu, Yuan
Goldszmidt, Moises
Su, Zhendong [1 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Columbia Univ, New York, NY 10027 USA
关键词
Languages; Reliability; Algorithms; Experimentation;
D O I
10.1145/1932682.1869475
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly scalable system for detecting these general syntactic inconsistency bugs. DejaVu operates in two phases. Given a target code base, a parallel inconsistent clone analysis first enumerates all groups of source code fragments that are similar but not identical. Next, an extensible buggy change analysis framework refines these results, separating each group of inconsistent fragments into a fine-grained set of inconsistent changes and classifying each as benign or buggy. On a 75+ million line pre-production commercial code base, DejaVu executed in under five hours and produced a report of over 8,000 potential bugs. Our analysis of a sizable random sample suggests with high likelihood that at this report contains at least 2,000 true bugs and 1,000 code smells. These bugs draw from a diverse class of software defects and are often simple to correct: syntactic inconsistencies both indicate problems and suggest solutions.
引用
收藏
页码:175 / 190
页数:16
相关论文
共 50 条
  • [21] Source-code Similarity Detection and Detection Tools Used in Academia: A Systematic Review
    Novak, Matija
    Joy, Mike
    Kermek, Dragutin
    ACM TRANSACTIONS ON COMPUTING EDUCATION, 2019, 19 (03)
  • [22] DroidCC: A Scalable Clone Detection Approach for Android Applications to Detect Similarity at Source Code Level
    Akram, Junaid
    Shi, Zhendong
    Mumtaz, Majid
    Ping, Luo
    2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2018, : 100 - 105
  • [23] Poisoned source code detection in code models
    Ghannoum, Ehab
    Ghafari, Mohammad
    JOURNAL OF SYSTEMS AND SOFTWARE, 2025, 226
  • [24] Systematic scanning for malicious source code
    Taft, S. Tucker
    2008 IEEE CONFERENCE ON TECHNOLOGIES FOR HOMELAND SECURITY, VOLS 1 AND 2, 2008, : 173 - 175
  • [25] ChatGPT Code Detection: Techniques for Uncovering the Source of Code
    Oedingen, Marc
    Engelhardt, Raphael C.
    Denz, Robin
    Hammer, Maximilian
    Konen, Wolfgang
    AI, 2024, 5 (03) : 1066 - 1094
  • [26] SrcMarker: Dual-Channel Source Code Watermarking via Scalable Code Transformations
    Yang, Borui
    Li, Wei
    Xiang, Liyao
    Li, Bo
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 4088 - 4106
  • [27] Detection of DS signal with source code
    Ding, YF
    Chen, JS
    PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS & SIGNAL PROCESSING, PROCEEDINGS, VOLS 1 AND 2, 2003, : 690 - 691
  • [28] Code detection in turbo source coding
    Haghighat, J
    Soleymani, AR
    Hamouda, W
    IEEE COMMUNICATIONS LETTERS, 2006, 10 (04) : 225 - 227
  • [29] Automatic Source Code Plagiarism Detection
    Kustanto, Cynthia
    Liem, Inggriani
    SNPD 2009: 10TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCES, NETWORKING AND PARALLEL DISTRIBUTED COMPUTING, PROCEEDINGS, 2009, : 481 - 486
  • [30] Source Code Representations for Plagiarism Detection
    Duracik, Michal
    Krsak, Emil
    Hrkut, Patrik
    LEARNING TECHNOLOGY FOR EDUCATION CHALLENGES, LTEC 2018, 2018, 870 : 61 - 69