Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model

被引:0
|
作者
Kaichi, Ryunosuke [1 ]
Matsumoto, Shinsuke [1 ]
Kusumoto, Shinji [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka, Japan
关键词
decompiler; fine-tuning; deep learning; quirk; grammatical error correction;
D O I
10.1007/978-3-031-49266-2_18
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Decompiler is a system for recovering the original code from bytecode. A critical challenge in decompilers is that the decompiled code contains differences from the original code. These differences not only reduce the readability of the source code but may also change the program's behavior. In this study, we propose a deep learning-based quirk fixation method that adopts grammatical error correction. One advantage of the proposed method is that it can be applied to any decompiler and programming language. Our experimental results show that the proposed method removes 55% of identifier quirks and 91% of structural quirks. In some cases, however, the proposed method injected a small amount of new quirks.
引用
收藏
页码:259 / 266
页数:8
相关论文
共 50 条
  • [1] Automatic Title Generation for Text with Pre-trained Transformer Language Model
    Mishra, Prakhar
    Diwan, Chaitali
    Srinivasa, Srinath
    Srinivasaraghavan, G.
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2021), 2021, : 17 - 24
  • [2] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [3] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [4] Protocol for the automatic extraction of epidemiological information via a pre-trained language model
    Wang, Zhizheng
    Liu, Xiao Fan
    Du, Zhanwei
    Wang, Lin
    Wu, Ye
    Holme, Petter
    Lachmann, Michael
    Lin, Hongfei
    Wang, Zhuoyue
    Cao, Yu
    Wong, Zoie S. Y.
    Xu, Xiao-Ke
    Sun, Yuanyuan
    STAR PROTOCOLS, 2023, 4 (03):
  • [5] Adder Encoder for Pre-trained Language Model
    Ding, Jianbang
    Zhang, Suiyun
    Li, Linlin
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
  • [6] Surgicberta: a pre-trained language model for procedural surgical language
    Bombieri, Marco
    Rospocher, Marco
    Ponzetto, Simone Paolo
    Fiorini, Paolo
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
  • [7] GENERATING HUMAN READABLE TRANSCRIPT FOR AUTOMATIC SPEECH RECOGNITION WITH PRE-TRAINED LANGUAGE MODEL
    Liao, Junwei
    Shi, Yu
    Gong, Ming
    Shou, Linjun
    Eskimez, Sefik
    Lu, Liyang
    Qu, Hong
    Zeng, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7578 - 7582
  • [8] Impact of data quality for automatic issue classification using pre-trained language models
    Colavito, Giuseppe
    Lanubile, Filippo
    Novielli, Nicole
    Quaranta, Luigi
    JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 210
  • [9] Research on Automatic Classification of Mine Safety Hazards Using Pre-Trained Language Models
    Qiang, Xingbang
    Li, Guoqing
    Hou, Jie
    Fan, Chunchao
    ELECTRONICS, 2025, 14 (05):
  • [10] Pre-trained Language Model for Biomedical Question Answering
    Yoon, Wonjin
    Lee, Jinhyuk
    Kim, Donghyeon
    Jeong, Minbyul
    Kang, Jaewoo
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 727 - 740