SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly

被引:1
|
作者
Armengol-Estape, Jordi [1 ]
Woodruff, Jackson [1 ]
Cummins, Chris [2 ]
O'Boyle, Michael F. P. [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
[2] Meta AI Res, Menlo Pk, CA USA
关键词
decompilation; neural decompilation; Transformer; language models; type inference;
D O I
10.1109/CGO57630.2024.10444788
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and ISAs. Recent interest in neural approaches has produced portable tools that generate readable code. Nevertheless, to-date such techniques are usually restricted to synthetic programs without optimization, and no models have evaluated their portability. Furthermore, while the code generated may be more readable, it is usually incorrect. This paper presents SLaDe, a Small Language model Decompiler based on a sequence-to-sequence Transformer trained over real-world code and augmented with a type inference engine. We utilize a novel tokenizer, dropout-free regularization, and type inference to generate programs that are more readable and accurate than standard analytic and recent neural approaches. Unlike standard approaches, SLaDe can infer out-of-context types and unlike neural approaches, it generates correct code. yWe evaluate SLaDe on over 4,000 ExeBench functions on two ISAs and at two optimization levels. SLaDe is up to 6x more accurate than Ghidra, a state-of-the-art, industrial-strength decompiler and up to 4x more accurate than the large language model ChatGPT and generates significantly more readable code than both.
引用
收藏
页码:67 / 80
页数:14
相关论文
共 50 条
  • [1] C--: A portable assembly language that supports garbage collection
    Jones, SP
    Ramsey, N
    Reig, F
    PRINCIPLES AND PRACTICE OF DECLARATIVE PROGRAMMING, PROCEEDINGS, 1999, 1702 : 1 - 28
  • [2] A PORTABLE QUERY LANGUAGE FOR SMALL-SCALE SYSTEMS
    PAPAZOGLOU, MP
    MICROPROCESSING AND MICROPROGRAMMING, 1988, 23 (1-5): : 299 - 304
  • [3] A CELL ASSEMBLY MODEL OF LANGUAGE
    PULVERMULLER, F
    PREISSL, H
    NETWORK-COMPUTATION IN NEURAL SYSTEMS, 1991, 2 (04) : 455 - 468
  • [4] MAINSTREAM APPLICATIONS REQUIRE OPTIMIZED ASSEMBLY LANGUAGE FOR FAST DSPS
    SWEENEY, JP
    EDN, 1994, 39 (09) : 77 - &
  • [5] fUML as an Assembly Language for Model Transformation
    Tisi, Massimo
    Jouault, Frederic
    Delatour, Jerome
    Saidi, Zied
    Choura, Hassene
    SOFTWARE LANGUAGE ENGINEERING, SLE 2014, 2014, 8706 : 171 - +
  • [6] fUML as an assembly language for model transformation
    Tisi, Massimo (massimo.tisi@inria.fr), 1600, Springer Verlag (8706):
  • [7] Construction of a semantic model for a typed assembly language
    Tan, G
    Appel, AW
    Swadi, KN
    Wu, DH
    VERIFICATION, MODEL CHECKING, AND ABSTRACT INTERPRETATION, PROCEEDINGS, 2004, 2937 : 30 - 43
  • [8] Enhancing Legal Argument Retrieval with Optimized Language Model Techniques
    Smywinski-Pohl, Aleksander
    Libal, Tomer
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2024, 2024, 14741 : 93 - 108
  • [9] Optimization of Small Portable Hoist Load-Capacity Model
    Tang, Yuanheng
    Yin, Hao
    Liang, Zuotang
    Xu, Cheng
    2017 5TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION (CDMMS 2017), 2017, 1834
  • [10] PALMTREE: Learning an Assembly Language Model for Instruction Embedding
    Li, Xuezixiang
    Qu, Yu
    Yin, Heng
    CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 3236 - 3251