Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition

被引:0
|
作者
Neumuller, Denis [1 ]
Straub, Raphael [1 ]
Sihler, Florian [1 ]
Tichy, Matthias [1 ]
机构
[1] Univ Ulm, Ulm, Germany
关键词
algorithm recognition; program comprehension; pattern matching; abstract syntax tree; domain-specific language; reverse engineering; maintenance; OPEN-SOURCE SOFTWARE;
D O I
10.1109/ICCQ60895.2024.10576984
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The automated recognition of algorithm implementations can support many software maintenance and reengineering activities by providing knowledge about the concerns present in the code base. Moreover, recognizing inefficient algorithms like Bubble Sort and suggesting superior alternatives from a library can help in assessing and improving the quality of a system. Approaches from related work suffer from usability as well as scalability issues and their accuracy is not evaluated. In this paper, we investigate how well our approach based on the abstract syntax tree of a program performs for automatic algorithm recognition. To this end, we have implemented a prototype consisting of: A domain-specific language designed to capture the key features of an algorithm and used to express a search pattern on the abstract syntax tree, a matching algorithm to find these features, and an initial catalog of "ready to use" patterns. To create our search patterns we performed a web search using the algorithm name and described key features of the found reference implementations with our domain-specific language. We evaluate our prototype on a subset of the BigCloneEval benchmark containing algorithms like Fibonacci, Bubble Sort, and Binary Search. We achieve an average F-1-score of 0.74 outperforming the large language model Codellama which attains 0.35. Additionally, we use multiple code clone detection tools as a baseline for comparison, achieving a recall of 0.62 while the best-performing tool reaches 0.20.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL
    Cao, Ruisheng
    Chen, Lu
    Li, Jieyu
    Zhang, Hanchong
    Xu, Hongshen
    Zhang, Wangyou
    Yu, Kai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13796 - 13813
  • [22] A differential testing approach for evaluating abstract syntax tree mapping algorithms
    Fan, Yuanrui
    Xia, Xin
    Lo, David
    Hassan, Ahmed E.
    Wang, Yuan
    Li, Shanping
    arXiv, 2021,
  • [23] Source Code Plagiarism Detection Based on Abstract Syntax Tree Fingerprintings
    Suttichaya, Vasin
    Eakvorachai, Niracha
    Lurkraisit, Tunchanok
    2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
  • [24] Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance
    Song, Yewei
    Lothritz, Cedric
    Tang, Daniel
    Bissyandé, Tegawendé F.
    Klein, Jacques
    arXiv,
  • [25] Efficient Vulnerability Detection based on abstract syntax tree and Deep Learning
    Feng, Hantao
    Fu, Xiaotong
    Sun, Hongyu
    Wang, He
    Zhang, Yuqing
    IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2020, : 722 - 727
  • [26] A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms
    Fan, Yuanrui
    Xia, Xin
    Lo, David
    Hassan, Ahmed E.
    Wang, Yuan
    Li, Shanping
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 2021, : 1174 - 1185
  • [27] The Metric for Automatic Code Generation Based on Dynamic Abstract Syntax Tree
    Yao, Wenjun
    Jiang, Ying
    Yang, Yang
    INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2023, 15 (01)
  • [28] Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance
    Song, Yewei
    Lothritz, Cedric
    Tang, Daniel
    Bissyande, Tegawende F.
    Klein, Jacques
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 38 - 46
  • [29] A Novel Neural Source Code Representation Based on Abstract Syntax Tree
    Zhang, Jian
    Wang, Xu
    Zhang, Hongyu
    Sun, Hailong
    Wang, Kaixuan
    Liu, Xudong
    2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, : 783 - 794
  • [30] Exploring the effectiveness of various patterns in an extended pattern search layout algorithm
    Yin, S
    Cagan, J
    JOURNAL OF MECHANICAL DESIGN, 2004, 126 (01) : 22 - 28