DIRECT : A Transformer-based Model for Decompiled Variable Name Recovery

被引:0
|
作者
Nitin, Vikram [1 ]
Saieva, Anthony [1 ]
Ray, Baishakhi [1 ]
Kaiser, Gail [1 ]
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Decompiling binary executables to high-level code is an important step in reverse engineering scenarios, such as malware analysis and legacy code maintenance. However, the generated high-level code is difficult to understand since the original variable names are lost. In this paper, we leverage transformer models to reconstruct the original variable names from decompiled code. Inherent differences between code and natural language present certain challenges in applying conventional transformer-based architectures to variable name recovery. We propose DIRECT, a novel transformer-based architecture customized specifically for the task at hand. We evaluate our model on a dataset of decompiled functions and find that DIRECT outperforms the previous state-of-the-art model by up to 20%. We also present ablation studies evaluating the impact of each of our modifications. We make the source code of DIRECT available to encourage reproducible research.
引用
收藏
页码:48 / 57
页数:10
相关论文
共 50 条
  • [1] Transformer-based approach to variable typing
    Rey, Charles Arthel
    Danguilan, Jose Lorenzo
    Mendoza, Karl Patrick
    Remolona, Miguel Francisco
    HELIYON, 2023, 9 (10)
  • [2] Transformer-Based Direct Hidden Markov Model for Machine Translation
    Wang, Weiyue
    Yang, Zijian
    Gao, Yingbo
    Ney, Hermann
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 23 - 32
  • [3] Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation
    Sant, Gerard
    Gallego, Gerard, I
    Alastruey, Belen
    Costa-Jussa, Marta R.
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 277 - 284
  • [4] Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation
    Sant, Gerard
    Gállego, Gerard I.
    Alastruey, Belen
    Costa-Jussà, Marta R.
    NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop, 2022, : 277 - 284
  • [5] A Transformer-based Function Symbol Name Inference Model from an Assembly Language for Binary Reversing
    Kim, HyunJin
    Bak, JinYeong
    Cho, Kyunghyun
    Koo, Hyungjoon
    PROCEEDINGS OF THE 2023 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, ASIA CCS 2023, 2023, : 951 - 965
  • [6] Direct conversion of peptides into diverse peptidomimetics using a transformer-based chemical language model
    Yoshimori, Atsushi
    Bajorath, Juergen
    EUROPEAN JOURNAL OF MEDICINAL CHEMISTRY REPORTS, 2025, 13
  • [7] Transformer-based Image Compression with Variable Image Quality Objectives
    Kao, Chia-Hao
    Chen, Yi-Hsin
    Chien, Cheng
    Chiu, Wei-Chen
    Peng, Wen-Hsiao
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1718 - 1725
  • [8] Smart Transformer-based Frequency Support in Variable Inertia Conditions
    Langwasser, Marius
    De Carne, Giovanni
    Liserre, Marco
    2019 IEEE 13TH INTERNATIONAL CONFERENCE ON COMPATIBILITY, POWER ELECTRONICS AND POWER ENGINEERING (CPE-POWERENG), 2019,
  • [10] Vision Transformer-Based Photovoltaic Prediction Model
    Kang, Zaohui
    Xue, Jizhong
    Lai, Chun Sing
    Wang, Yu
    Yuan, Haoliang
    Xu, Fangyuan
    ENERGIES, 2023, 16 (12)