SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly

被引:1
|
作者
Armengol-Estape, Jordi [1 ]
Woodruff, Jackson [1 ]
Cummins, Chris [2 ]
O'Boyle, Michael F. P. [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
[2] Meta AI Res, Menlo Pk, CA USA
来源
2024 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, CGO | 2024年
关键词
decompilation; neural decompilation; Transformer; language models; type inference;
D O I
10.1109/CGO57630.2024.10444788
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and ISAs. Recent interest in neural approaches has produced portable tools that generate readable code. Nevertheless, to-date such techniques are usually restricted to synthetic programs without optimization, and no models have evaluated their portability. Furthermore, while the code generated may be more readable, it is usually incorrect. This paper presents SLaDe, a Small Language model Decompiler based on a sequence-to-sequence Transformer trained over real-world code and augmented with a type inference engine. We utilize a novel tokenizer, dropout-free regularization, and type inference to generate programs that are more readable and accurate than standard analytic and recent neural approaches. Unlike standard approaches, SLaDe can infer out-of-context types and unlike neural approaches, it generates correct code. yWe evaluate SLaDe on over 4,000 ExeBench functions on two ISAs and at two optimization levels. SLaDe is up to 6x more accurate than Ghidra, a state-of-the-art, industrial-strength decompiler and up to 4x more accurate than the large language model ChatGPT and generates significantly more readable code than both.
引用
收藏
页码:67 / 80
页数:14
相关论文
共 50 条
  • [21] Delay-Optimized Small World Model For Base Station Caching
    Wang, Zhendong
    Peng, Mugen
    Chen, Dongyong
    Li, Yong
    Zhou, Jinhe
    2015 IEEE 26TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2015, : 1447 - 1452
  • [22] An optimized small animal tumour model for experimentation with low energy protons
    Beyreuther, Elke
    Bruechner, Kerstin
    Krause, Mechthild
    Schmidt, Margret
    Szabo, Rita
    Pawelke, Joerg
    PLOS ONE, 2017, 12 (05):
  • [23] A High-Level Model for an Assembly Language Attacker by Means of Reflection
    Larmuseau, Adriaan
    Patrignani, Marco
    Clarke, Dave
    DEPENDABLE SOFTWARE ENGINEERING: THEORIES, TOOLS, AND APPLICATIONS, SETTA 2015, 2015, 9409 : 168 - 182
  • [24] Analysis of a Portable Generator Turbine for Small and Medium Industry by Using a Business Model Approach
    Dinata, Uyung G. S.
    Hasan, Alizar
    Fithri, Prima
    Suresti, Amna
    Wati, Rahmi
    QUALITY-ACCESS TO SUCCESS, 2024, 25 (200): : 274 - 280
  • [25] Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
    Lemerle, Theodor
    Obin, Nicolas
    Roebel, Axel
    INTERSPEECH 2024, 2024, : 3420 - 3424
  • [26] NEWLSTM: An Optimized Long Short-Term Memory Language Model for Sequence Prediction
    Wang, Qing
    Peng, Rong-Qun
    Wang, Jia-Qiang
    Li, Zhi
    Qu, Han-Bing
    IEEE ACCESS, 2020, 8 : 65395 - 65401
  • [27] ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training
    Liang, Yuhang
    Li, Xinyi
    Ren, Jie
    Li, Ang
    Fang, Bo
    Chen, Jieyang
    PROCEEDINGS OF THE 2025 THE 30TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2025, 2025, : 252 - 266
  • [28] A Dynamic Circuit Model of a Small Direct Methanol Fuel Cell for Portable Electronic Devices
    Guarnieri, Massimo
    Di Noto, Vito
    Moro, Federico
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2010, 57 (06) : 1865 - 1873
  • [29] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
    Zhong, Yinmin
    Liu, Shengyu
    Chen, Junda
    Hu, Jianbo
    Zhu, Yibo
    Liu, Xuanzhe
    Jin, Xin
    Zhang, Hao
    PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 193 - 210
  • [30] Making Language Model as Small as Possible in Statistical Machine Translation
    Liu, Yang
    Zhang, Jiajun
    Hao, Jie
    Zhang, Dakun
    MACHINE TRANSLATION, CWMT 2014, 2014, 493 : 1 - 12