Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition

被引:0
|
作者
Neumuller, Denis [1 ]
Straub, Raphael [1 ]
Sihler, Florian [1 ]
Tichy, Matthias [1 ]
机构
[1] Univ Ulm, Ulm, Germany
关键词
algorithm recognition; program comprehension; pattern matching; abstract syntax tree; domain-specific language; reverse engineering; maintenance; OPEN-SOURCE SOFTWARE;
D O I
10.1109/ICCQ60895.2024.10576984
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The automated recognition of algorithm implementations can support many software maintenance and reengineering activities by providing knowledge about the concerns present in the code base. Moreover, recognizing inefficient algorithms like Bubble Sort and suggesting superior alternatives from a library can help in assessing and improving the quality of a system. Approaches from related work suffer from usability as well as scalability issues and their accuracy is not evaluated. In this paper, we investigate how well our approach based on the abstract syntax tree of a program performs for automatic algorithm recognition. To this end, we have implemented a prototype consisting of: A domain-specific language designed to capture the key features of an algorithm and used to express a search pattern on the abstract syntax tree, a matching algorithm to find these features, and an initial catalog of "ready to use" patterns. To create our search patterns we performed a web search using the algorithm name and described key features of the found reference implementations with our domain-specific language. We evaluate our prototype on a subset of the BigCloneEval benchmark containing algorithms like Fibonacci, Bubble Sort, and Binary Search. We achieve an average F-1-score of 0.74 outperforming the large language model Codellama which attains 0.35. Additionally, we use multiple code clone detection tools as a baseline for comparison, achieving a recall of 0.62 while the best-performing tool reaches 0.20.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] An Algorithm for Identifying the Abstract Syntax of Graph-Based Diagrams
    Anaby-Tavor, Ateret
    Amid, David
    Fisher, Amit
    Ossher, Harold
    Bellamy, Rachel
    Callery, Matthew
    Desmond, Michael
    Krasikov, Sophia
    Roth, Tova
    Simmonds, Ian
    de Vries, Jacqueline
    2009 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING, PROCEEDINGS, 2009, : 193 - +
  • [32] Element network analysis: A method for exploring the syntax of abstract decoration on artefacts
    Bloch, Rie
    Johannsen, Niels N.
    JOURNAL OF ARCHAEOLOGICAL SCIENCE-REPORTS, 2025, 61
  • [33] Abstract Predicate Entailment over Points-To Heaplets is Syntax Recognition
    Haberland, Rene
    Krinkin, Grill
    Ivanovskiy, Sergey
    2016 18TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION AND SEMINAR ON INFORMATION SECURITY AND PROTECTION OF INFORMATION TECHNOLOGY (FRUCT-ISPIT), 2016, : 66 - 74
  • [34] Automatic Equivalent Mutants Classification Using Abstract Syntax Tree Neural Networks
    Peacock, Samuel
    Deng, Lin
    Dehlinger, Josh
    Chakraborty, Suranjan
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2021), 2021, : 13 - 18
  • [35] MCAST: An abstract-syntax-tree based model compiler for circuit simulation
    Wan, B
    Hu, BP
    Zhou, LL
    Shi, CJR
    PROCEEDINGS OF THE IEEE 2003 CUSTOM INTEGRATED CIRCUITS CONFERENCE, 2003, : 249 - 252
  • [36] Malicious Code Utilization Chain Detection Scheme based on Abstract Syntax Tree
    Si, Guanlin
    Zhang, Yue
    Li, Min
    Jing, Sen
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1108 - 1111
  • [37] Reverse Engineering of Source Code to Sequence Diagram Using Abstract Syntax Tree
    Fauzil, Esa
    Hendradjaya, Bayu
    Sunindyo, Wikan Danar
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2016,
  • [38] An Abstract Syntax Tree Encoding Method for Cross-Project Defect Prediction
    Cai, Ziyi
    Lu, Lu
    Qiu, Shaojian
    IEEE ACCESS, 2019, 7 : 170844 - 170853
  • [39] Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting
    Lin, Chen
    Ouyang, Zhichao
    Zhuang, Junqing
    Chen, Jianqiang
    Li, Hui
    Wu, Rongxin
    2021 IEEE/ACM 29TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2021), 2021, : 184 - 195
  • [40] Static checking method of array access violation based on abstract syntax tree
    Software Engineering Institute, Xidian University, Xi'an 710071, China
    Jisuanji Gongcheng, 2006, 1 (108-109+205):