On the Role of Pre-trained Embeddings in Binary Code Analysis

被引：0

作者：

Maier, Alwin ^{[1
]}

Weissberg, Felix ^{[2
]}

Rieck, Konrad ^{[2
,3
]}

机构：

[1] Max Planck Inst Solar Syst Res, Gottingen, Germany

[2] Tech Univ Berlin, Berlin, Germany

[3] BIFOLD, Berlin, Germany

来源：

PROCEEDINGS OF THE 19TH ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, ACM ASIACCS 2024 | 2024年

基金：

欧洲研究理事会;

关键词：

Transfer learning; Binary code analysis;

D O I：

10.1145/3634737.3657029

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning has enabled remarkable progress in binary code analysis. In particular, pre-trained embeddings of assembly code have become a gold standard for solving analysis tasks, such as measuring code similarity or recognizing functions. These embeddings are capable of learning a vector representation from unlabeled code. In contrast to natural language processing, however, label information is not scarce for many tasks in binary code analysis. For example, labeled training data for function boundaries, optimization levels, and argument types can be easily derived from debug information provided by a compiler. Consequently, the main motivation of embeddings does not transfer directly to binary code analysis. In this paper, we explore the role of pre-trained embeddings from a critical perspective. To this end, we systematically evaluate recent embeddings for assembly code on five downstream tasks using a corpus of 1.2 million functions from the Debian distribution. We observe that several embeddings perform similarly when sufficient labeled data is available, and that differences reported in prior work are hardly noticeable. Surprisingly, we find that end-to-end learning without pre-training performs best on average, which calls into question the need for specialized embeddings. By varying the amount of labeled data, we eventually derive guidelines for when embeddings offer advantages and when end-to-end learning is preferable for binary code analysis.

引用

页码：795 / 810

页数：16

共 50 条

[1] Optimizing Pre-Trained Code Embeddings With Triplet Loss for Code Smell Detection
Nizam, Ali
Islamoglu, Ertugrul
Kerem Adali, Omer
Aydin, Musa
IEEE ACCESS, 2025, 13 : 31335 - 31350
[2] Pre-trained Embeddings for Entity Resolution: An Experimental Analysis
Zeakis, Alexandros
Papadakis, George
Skoutas, Dimitrios
Koubarakis, Manolis
PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (09): : 2225 - 2238
[3] Debiasing Pre-trained Contextualised Embeddings
Kaneko, Masahiro
Bollegala, Danushka
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1256 - 1266
[4] Can pre-trained code embeddings improve model performance? Revisiting the use of code embeddings in software engineering tasks
Zishuo Ding
Heng Li
Weiyi Shang
Tse-Hsun Peter Chen
Empirical Software Engineering, 2022, 27
[5] Can pre-trained code embeddings improve model performance? Revisiting the use of code embeddings in software engineering tasks
Ding, Zishuo
Li, Heng
Shang, Weiyi
Chen, Tse-Hsun
EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (03)
[6] An in-depth analysis of pre-trained embeddings for entity resolution
Zeakis, Alexandros
Papadakis, George
Skoutas, Dimitrios
Koubarakis, Manolis
VLDB JOURNAL, 2025, 34 (01):
[7] Sentiment analysis based on improved pre-trained word embeddings
Rezaeinia, Seyed Mahdi
Rahmani, Rouhollah
Ghodsi, Ali
Veisi, Hadi
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 117 : 139 - 147
[8] An Investigation of Pre-trained Embeddings in Dependency Parsing
Carvalho de Araujo, Juliana C.
Freitas, Claudia
Pacheco, Marco Aurelio C.
Forero-Mendoza, Leonardo A.
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 281 - 290
[9] Leveraging Pre-Trained Embeddings for Welsh Taggers
Ezeani, Ignatius M.
Piao, Scott
Neale, Steven
Rayson, Paul
Knight, Dawn
4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 270 - 280
[10] An in-depth analysis of pre-trained embeddings for entity resolution: An in-depth analysis of pre-trained embeddings for entity resolution: A. Zeakis et al.
Alexandros Zeakis
George Papadakis
Dimitrios Skoutas
Manolis Koubarakis
Zeakis, Alexandros (alzeakis@di.uoa.gr), 2025, 34 (01):

← 1 2 3 4 5 →