Identifying Functionally Similar Code in Complex Codebases

被引:0
|
作者
Su, Fang-Hsiang [1 ]
Bell, Jonathan [1 ]
Kaiser, Gail [1 ]
Sethumadhavan, Simha [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
关键词
I/O behavior; dynamic analysis; code clone detection; data flow analysis; patterns;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Identifying similar code in software systems can assist many software engineering tasks such as program understanding and software refactoring. While most approaches focus on identifying code that looks alike, some techniques aim at detecting code that functions alike. Detecting these functional clones - code that functions alike - in object oriented languages remains an open question because of the difficulty in exposing and comparing programs' functionality effectively. We propose a novel technique, In-Vivo Clone Detection, that detects functional clones in arbitrary programs by identifying and mining their inputs and outputs. The key insight is to use existing workloads to execute programs and then measure functional similarities between programs based on their inputs and outputs, which mitigates the problems in object oriented languages reported by prior work. We implement such technique in our system, HitoshiIO, which is open source and freely available. Our experimental results show that HitoshiIO detects more than 800 functional clones across a corpus of 118 projects. In a random sample of the detected clones, HitoshiIO achieves 68+% true positive rate with only 15% false positive rate.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Extremal graphs for the identifying code problem
    Foucaud, Florent
    Guerrini, Eleonora
    Kovse, Matjaz
    Naserasr, Reza
    Parreau, Aline
    Valicov, Petru
    EUROPEAN JOURNAL OF COMBINATORICS, 2011, 32 (04) : 628 - 638
  • [42] Identifying use cases in source code
    Zhang, Lu
    Qin, Tao
    Zhou, Zhiying
    Hao, Dan
    Sun, Jiasu
    JOURNAL OF SYSTEMS AND SOFTWARE, 2006, 79 (11) : 1588 - 1598
  • [43] RepliComment: Identifying clones in code comments
    Blasi, Arianna
    Stulova, Nataliia
    Gorla, Alessandra
    Nierstrasz, Oscar
    JOURNAL OF SYSTEMS AND SOFTWARE, 2021, 182
  • [44] The watching system as a generalization of identifying code
    Ghorbani, Modjtaba
    Dehmer, Matthias
    Maimani, Hamidreza
    Maddah, Sheyda
    Roozbayani, Maryam
    Emmert-Streib, Frank
    APPLIED MATHEMATICS AND COMPUTATION, 2020, 380
  • [45] Terrorist Network Monitoring with Identifying Code
    Sen, Arunabha
    Goliber, Victoria Horan
    Zhou, Chenyang
    Basu, Kaustav
    SOCIAL, CULTURAL, AND BEHAVIORAL MODELING, SBP-BRIMS 2018, 2018, 10899 : 329 - 339
  • [46] Identifying Source Code File Experts
    Cury, Otavio
    Avelino, Guilherme
    Neto, Pedro Santos
    Britto, Ricardo
    Valente, Marco Tulio
    PROCEEDINGS OF THE16TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT, ESEM 2022, 2022, : 125 - 136
  • [47] Recommendations for Developers Identifying Code Smells
    de Mello, Rafael
    Oliveira, Roberto
    Uchoa, Anderson
    Oizumi, Willian
    Garcia, Alessandro
    Fonseca, Baldoino
    de Mello, Fernanda
    IEEE SOFTWARE, 2023, 40 (02) : 90 - 98
  • [48] RepliComment: Identifying Clones in Code Comments
    Blasi, Arianna
    Gorla, Alessandra
    2018 IEEE/ACM 26TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2018), 2018, : 320 - 323
  • [49] SimilaR: R Code Clone and Plagiarism Detection
    Bartoszuk, Maciej
    Gagolewski, Marek
    R JOURNAL, 2020, 12 (01): : 367 - 385
  • [50] Similar Code Detection and Elimination for Erlang Programs
    Li, Huiqing
    Thompson, Simon
    PRACTICAL ASPECTS OF DECLARATIVE LANGUAGES, PROCEEDINGS, 2010, 5937 : 104 - 118