Identifying Functionally Similar Code in Complex Codebases

被引:0
|
作者
Su, Fang-Hsiang [1 ]
Bell, Jonathan [1 ]
Kaiser, Gail [1 ]
Sethumadhavan, Simha [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
关键词
I/O behavior; dynamic analysis; code clone detection; data flow analysis; patterns;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Identifying similar code in software systems can assist many software engineering tasks such as program understanding and software refactoring. While most approaches focus on identifying code that looks alike, some techniques aim at detecting code that functions alike. Detecting these functional clones - code that functions alike - in object oriented languages remains an open question because of the difficulty in exposing and comparing programs' functionality effectively. We propose a novel technique, In-Vivo Clone Detection, that detects functional clones in arbitrary programs by identifying and mining their inputs and outputs. The key insight is to use existing workloads to execute programs and then measure functional similarities between programs based on their inputs and outputs, which mitigates the problems in object oriented languages reported by prior work. We implement such technique in our system, HitoshiIO, which is open source and freely available. Our experimental results show that HitoshiIO detects more than 800 functional clones across a corpus of 118 projects. In a random sample of the detected clones, HitoshiIO achieves 68+% true positive rate with only 15% false positive rate.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] PROCEDURES FOR IDENTIFYING ROCKS WITH SIMILAR FEATURES
    POWELL, WE
    JOURNAL OF GEOGRAPHY, 1984, 83 (01) : 30 - 34
  • [32] Computational Methods for Identifying Similar Diseases
    Cheng, Liang
    Zhao, Hengqiang
    Wang, Pingping
    Zhou, Wenyang
    Luo, Meng
    Li, Tianxin
    Han, Junwei
    Liu, Shulin
    Jiang, Qinghua
    MOLECULAR THERAPY-NUCLEIC ACIDS, 2019, 18 : 590 - 604
  • [33] Identifying similar subsequences in data streams
    Toyoda, Machiko
    Sakurai, Yasushi
    Ichikawa, Toshikazu
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2008, 5181 : 210 - +
  • [34] Identifying Nursing Concepts: Are We Similar?
    Falan, Sharie L.
    INTERNATIONAL JOURNAL OF NURSING TERMINOLOGIES AND CLASSIFICATIONS, 2010, 21 (03): : 108 - 115
  • [35] A RUNGE-KUTTA-FEHLBERG CODE FOR THE COMPLEX PLANE: COMPARING WITH SIMILAR CODES BY APPLYING TO POLYTROPIC MODELS
    Geroyannis, V. S.
    Valvi, F. N.
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2012, 23 (05):
  • [36] Identifying functionally informative evolutionary sequence profiles
    Gil, Nelson
    Fiser, Andras
    BIOINFORMATICS, 2018, 34 (08) : 1278 - 1286
  • [37] Identifying clusters of functionally related genes in genomes
    Yi, Gangman
    Sze, Sing-Hoi
    Thon, Michael R.
    BIOINFORMATICS, 2007, 23 (09) : 1053 - 1060
  • [38] CorGO: An Integrated Method for Clustering Functionally Similar Genes
    Namrata Pant
    Madhumita Madhumita
    Sushmita Paul
    Interdisciplinary Sciences: Computational Life Sciences, 2021, 13 : 624 - 637
  • [39] CorGO: An Integrated Method for Clustering Functionally Similar Genes
    Pant, Namrata
    Madhumita, Madhumita
    Paul, Sushmita
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2021, 13 (04) : 624 - 637
  • [40] An improved algorithm for identifying objects in code
    Canfora, G
    Cimitile, A
    Munro, M
    SOFTWARE-PRACTICE & EXPERIENCE, 1996, 26 (01): : 25 - 48