SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly

被引：1

作者：

Armengol-Estape, Jordi ^{[1
]}

Woodruff, Jackson ^{[1
]}

Cummins, Chris ^{[2
]}

O'Boyle, Michael F. P. ^{[1
]}

机构：

[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland

[2] Meta AI Res, Menlo Pk, CA USA

来源：

2024 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, CGO | 2024年

关键词：

decompilation; neural decompilation; Transformer; language models; type inference;

D O I：

10.1109/CGO57630.2024.10444788

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and ISAs. Recent interest in neural approaches has produced portable tools that generate readable code. Nevertheless, to-date such techniques are usually restricted to synthetic programs without optimization, and no models have evaluated their portability. Furthermore, while the code generated may be more readable, it is usually incorrect. This paper presents SLaDe, a Small Language model Decompiler based on a sequence-to-sequence Transformer trained over real-world code and augmented with a type inference engine. We utilize a novel tokenizer, dropout-free regularization, and type inference to generate programs that are more readable and accurate than standard analytic and recent neural approaches. Unlike standard approaches, SLaDe can infer out-of-context types and unlike neural approaches, it generates correct code. yWe evaluate SLaDe on over 4,000 ExeBench functions on two ISAs and at two optimization levels. SLaDe is up to 6x more accurate than Ghidra, a state-of-the-art, industrial-strength decompiler and up to 4x more accurate than the large language model ChatGPT and generates significantly more readable code than both.

引用

页码：67 / 80

页数：14

共 50 条

[21] Delay-Optimized Small World Model For Base Station Caching
Wang, Zhendong
Peng, Mugen
Chen, Dongyong
Li, Yong
Zhou, Jinhe
2015 IEEE 26TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2015, : 1447 - 1452
[22] An optimized small animal tumour model for experimentation with low energy protons
Beyreuther, Elke
Bruechner, Kerstin
Krause, Mechthild
Schmidt, Margret
Szabo, Rita
Pawelke, Joerg
PLOS ONE, 2017, 12 (05):
[23] A High-Level Model for an Assembly Language Attacker by Means of Reflection
Larmuseau, Adriaan
Patrignani, Marco
Clarke, Dave
DEPENDABLE SOFTWARE ENGINEERING: THEORIES, TOOLS, AND APPLICATIONS, SETTA 2015, 2015, 9409 : 168 - 182
[24] Analysis of a Portable Generator Turbine for Small and Medium Industry by Using a Business Model Approach
Dinata, Uyung G. S.
Hasan, Alizar
Fithri, Prima
Suresti, Amna
Wati, Rahmi
QUALITY-ACCESS TO SUCCESS, 2024, 25 (200): : 274 - 280
[25] Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Lemerle, Theodor
Obin, Nicolas
Roebel, Axel
INTERSPEECH 2024, 2024, : 3420 - 3424
[26] NEWLSTM: An Optimized Long Short-Term Memory Language Model for Sequence Prediction
Wang, Qing
Peng, Rong-Qun
Wang, Jia-Qiang
Li, Zhi
Qu, Han-Bing
IEEE ACCESS, 2020, 8 : 65395 - 65401
[27] ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training
Liang, Yuhang
Li, Xinyi
Ren, Jie
Li, Ang
Fang, Bo
Chen, Jieyang
PROCEEDINGS OF THE 2025 THE 30TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2025, 2025, : 252 - 266
[28] A Dynamic Circuit Model of a Small Direct Methanol Fuel Cell for Portable Electronic Devices
Guarnieri, Massimo
Di Noto, Vito
Moro, Federico
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2010, 57 (06) : 1865 - 1873
[29] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Zhong, Yinmin
Liu, Shengyu
Chen, Junda
Hu, Jianbo
Zhu, Yibo
Liu, Xuanzhe
Jin, Xin
Zhang, Hao
PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 193 - 210
[30] Making Language Model as Small as Possible in Statistical Machine Translation
Liu, Yang
Zhang, Jiajun
Hao, Jie
Zhang, Dakun
MACHINE TRANSLATION, CWMT 2014, 2014, 493 : 1 - 12

← 1 2 3 4 5 →