Fine-tuning Language Models for Joint Rewriting and Completion of Code with Potential Bugs

被引:0
|
作者
Wang, Dingmin [1 ]
Zhao, Jinman [2 ]
Pei, Hengzhi [2 ]
Tana, Samson [3 ]
Zha, Sheng [3 ]
机构
[1] Univ Oxford, Oxford, England
[2] Amazon Web Serv, Seattle, WA USA
[3] Amazon AGI, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Handling drafty partial code remains a notable challenge in real-time code suggestion applications. Previous work has demonstrated shortcomings of large language models of code (CodeLLMs) in completing partial code with potential bugs. In this study, we view partial code as implementation hints and finetune CodeLLMs to jointly rewrite and complete partial code into functional full programs. We explore two strategies: one-pass generation and multi-pass iterative refinement. We construct new training and testing datasets using semantic-altering code transformations and iterative self-generations. We conduct comprehensive experiments over three representative open-sourced CodeLLMs - InCoder, CodeGen, and StarCoder. Results show that CodeLLMs fine-tuned using our approach achieve superior pass rates compared to the previous baselines across existing and newly-created benchmarks, effectively handle both potentially buggy and clean code, and largely preserve the integrity of the original partial implementations. We further present findings on the properties of the potential bugs we tested and on the design choices of our methods.
引用
收藏
页码:15854 / 15868
页数:15
相关论文
共 50 条
  • [21] Fine-tuning large neural language models for biomedical natural language processing
    Tinn, Robert
    Cheng, Hao
    Gu, Yu
    Usuyama, Naoto
    Liu, Xiaodong
    Naumann, Tristan
    Gao, Jianfeng
    Poon, Hoifung
    PATTERNS, 2023, 4 (04):
  • [22] Fine-tuning natural language imperatives
    Kaufmann, Magdalena
    JOURNAL OF LOGIC AND COMPUTATION, 2019, 29 (03) : 321 - 348
  • [23] On Surgical Fine-tuning for Language Encoders
    Lodha, Abhilasha
    Belapurkar, Gayatri
    Chalkapurkar, Saloni
    Tao, Yuanming
    Ghosh, Reshmi
    Basu, Samyadeep
    Petrov, Dmitrii
    Srinivasan, Soundararajan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3105 - 3113
  • [24] Fine-Tuning Large Language Models for Private Document Retrieval: A Tutorial
    Sommers, Frank
    Kongthon, Alisa
    Kongyoung, Sarawoot
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1319 - 1320
  • [25] Large language models in Radiology: The importance of fine-tuning and the fable of the luthier
    Martin-Noguerol, Teodoro
    Lopez-Ubeda, Pilar
    Luna, Antonio
    EUROPEAN JOURNAL OF RADIOLOGY, 2024, 178
  • [26] Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization
    Zhu, Beier
    Niu, Yulei
    Lee, Saeil
    Hur, Minhoe
    Zhang, Hanwang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3834 - 3842
  • [27] LLAMAFACTORY: Unified Efficient Fine-Tuning of 100+Language Models
    Zheng, Yaowei
    Zhang, Richong
    Zhang, Junhao
    Ye, Yanhan
    Luo, Zheyan
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 3: SYSTEM DEMONSTRATIONS, 2024, : 400 - 410
  • [28] Democratizing protein language models with parameter-efficient fine-tuning
    Sledzieski, Samuel
    Kshirsagar, Meghana
    Baek, Minkyung
    Dodhia, Rahul
    Ferres, Juan Lavista
    Berger, Bonnie
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (26)
  • [29] Fine-Tuning Language Models For Semi-Supervised Text Mining
    Chen, Xinyu
    Beaver, Ian
    Freeman, Cynthia
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 3608 - 3617
  • [30] Distributed Inference and Fine-tuning of Large Language Models Over The Internet
    Borzunov, Alexander
    Ryabinin, Max
    Chumachenko, Artem
    Baranchuk, Dmitry
    Dettmers, Tim
    Belkada, Younes
    Samygin, Pavel
    Raffel, Colin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,