Fine-tuning Language Models for Joint Rewriting and Completion of Code with Potential Bugs

被引：0

作者：

Wang, Dingmin ^{[1
]}

Zhao, Jinman ^{[2
]}

Pei, Hengzhi ^{[2
]}

Tana, Samson ^{[3
]}

Zha, Sheng ^{[3
]}

机构：

[1] Univ Oxford, Oxford, England

[2] Amazon Web Serv, Seattle, WA USA

[3] Amazon AGI, Seattle, WA USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Handling drafty partial code remains a notable challenge in real-time code suggestion applications. Previous work has demonstrated shortcomings of large language models of code (CodeLLMs) in completing partial code with potential bugs. In this study, we view partial code as implementation hints and finetune CodeLLMs to jointly rewrite and complete partial code into functional full programs. We explore two strategies: one-pass generation and multi-pass iterative refinement. We construct new training and testing datasets using semantic-altering code transformations and iterative self-generations. We conduct comprehensive experiments over three representative open-sourced CodeLLMs - InCoder, CodeGen, and StarCoder. Results show that CodeLLMs fine-tuned using our approach achieve superior pass rates compared to the previous baselines across existing and newly-created benchmarks, effectively handle both potentially buggy and clean code, and largely preserve the integrity of the original partial implementations. We further present findings on the properties of the potential bugs we tested and on the design choices of our methods.

引用

页码：15854 / 15868

页数：15

共 50 条

[21] Fine-tuning large neural language models for biomedical natural language processing
Tinn, Robert
Cheng, Hao
Gu, Yu
Usuyama, Naoto
Liu, Xiaodong
Naumann, Tristan
Gao, Jianfeng
Poon, Hoifung
PATTERNS, 2023, 4 (04):
[22] Fine-tuning natural language imperatives
Kaufmann, Magdalena
JOURNAL OF LOGIC AND COMPUTATION, 2019, 29 (03) : 321 - 348
[23] On Surgical Fine-tuning for Language Encoders
Lodha, Abhilasha
Belapurkar, Gayatri
Chalkapurkar, Saloni
Tao, Yuanming
Ghosh, Reshmi
Basu, Samyadeep
Petrov, Dmitrii
Srinivasan, Soundararajan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3105 - 3113
[24] Fine-Tuning Large Language Models for Private Document Retrieval: A Tutorial
Sommers, Frank
Kongthon, Alisa
Kongyoung, Sarawoot
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1319 - 1320
[25] Large language models in Radiology: The importance of fine-tuning and the fable of the luthier
Martin-Noguerol, Teodoro
Lopez-Ubeda, Pilar
Luna, Antonio
EUROPEAN JOURNAL OF RADIOLOGY, 2024, 178
[26] Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization
Zhu, Beier
Niu, Yulei
Lee, Saeil
Hur, Minhoe
Zhang, Hanwang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3834 - 3842
[27] LLAMAFACTORY: Unified Efficient Fine-Tuning of 100+Language Models
Zheng, Yaowei
Zhang, Richong
Zhang, Junhao
Ye, Yanhan
Luo, Zheyan
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 3: SYSTEM DEMONSTRATIONS, 2024, : 400 - 410
[28] Democratizing protein language models with parameter-efficient fine-tuning
Sledzieski, Samuel
Kshirsagar, Meghana
Baek, Minkyung
Dodhia, Rahul
Ferres, Juan Lavista
Berger, Bonnie
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (26)
[29] Fine-Tuning Language Models For Semi-Supervised Text Mining
Chen, Xinyu
Beaver, Ian
Freeman, Cynthia
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 3608 - 3617
[30] Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Borzunov, Alexander
Ryabinin, Max
Chumachenko, Artem
Baranchuk, Dmitry
Dettmers, Tim
Belkada, Younes
Samygin, Pavel
Raffel, Colin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →