Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model

被引：0

作者：

Kaichi, Ryunosuke ^{[1
]}

Matsumoto, Shinsuke ^{[1
]}

Kusumoto, Shinji ^{[1
]}

机构：

[1] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka, Japan

来源：

PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2023, PT I | 2024年 / 14483卷

关键词：

decompiler; fine-tuning; deep learning; quirk; grammatical error correction;

D O I：

10.1007/978-3-031-49266-2_18

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Decompiler is a system for recovering the original code from bytecode. A critical challenge in decompilers is that the decompiled code contains differences from the original code. These differences not only reduce the readability of the source code but may also change the program's behavior. In this study, we propose a deep learning-based quirk fixation method that adopts grammatical error correction. One advantage of the proposed method is that it can be applied to any decompiler and programming language. Our experimental results show that the proposed method removes 55% of identifier quirks and 91% of structural quirks. In some cases, however, the proposed method injected a small amount of new quirks.

引用

页码：259 / 266

页数：8

共 50 条

[1] Automatic Title Generation for Text with Pre-trained Transformer Language Model
Mishra, Prakhar
Diwan, Chaitali
Srinivasa, Srinath
Srinivasaraghavan, G.
2021 IEEE 15TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2021), 2021, : 17 - 24
[2] Hyperbolic Pre-Trained Language Model
Chen, Weize
Han, Xu
Lin, Yankai
He, Kaichen
Xie, Ruobing
Zhou, Jie
Liu, Zhiyuan
Sun, Maosong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
[3] Pre-trained Language Model Representations for Language Generation
Edunov, Sergey
Baevski, Alexei
Auli, Michael
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
[4] Protocol for the automatic extraction of epidemiological information via a pre-trained language model
Wang, Zhizheng
Liu, Xiao Fan
Du, Zhanwei
Wang, Lin
Wu, Ye
Holme, Petter
Lachmann, Michael
Lin, Hongfei
Wang, Zhuoyue
Cao, Yu
Wong, Zoie S. Y.
Xu, Xiao-Ke
Sun, Yuanyuan
STAR PROTOCOLS, 2023, 4 (03):
[5] Adder Encoder for Pre-trained Language Model
Ding, Jianbang
Zhang, Suiyun
Li, Linlin
CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
[6] Surgicberta: a pre-trained language model for procedural surgical language
Bombieri, Marco
Rospocher, Marco
Ponzetto, Simone Paolo
Fiorini, Paolo
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
[7] GENERATING HUMAN READABLE TRANSCRIPT FOR AUTOMATIC SPEECH RECOGNITION WITH PRE-TRAINED LANGUAGE MODEL
Liao, Junwei
Shi, Yu
Gong, Ming
Shou, Linjun
Eskimez, Sefik
Lu, Liyang
Qu, Hong
Zeng, Michael
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7578 - 7582
[8] Impact of data quality for automatic issue classification using pre-trained language models
Colavito, Giuseppe
Lanubile, Filippo
Novielli, Nicole
Quaranta, Luigi
JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 210
[9] Research on Automatic Classification of Mine Safety Hazards Using Pre-Trained Language Models
Qiang, Xingbang
Li, Guoqing
Hou, Jie
Fan, Chunchao
ELECTRONICS, 2025, 14 (05):
[10] Pre-trained Language Model for Biomedical Question Answering
Yoon, Wonjin
Lee, Jinhyuk
Kim, Donghyeon
Jeong, Minbyul
Kang, Jaewoo
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 727 - 740

← 1 2 3 4 5 →