A syntax-guided multi-task learning approach for Turducken-style code generation

被引:2
|
作者
Yang, Guang [1 ]
Zhou, Yu [1 ]
Chen, Xiang [2 ]
Zhang, Xiangyu [1 ]
Xu, Yiran [1 ]
Han, Tingting [3 ]
Chen, Taolue [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nantong Univ, Sch Informat Sci & Technol, Nantong, Peoples R China
[3] Birkbeck Univ London, Dept Comp Sci, London, England
基金
中国国家自然科学基金;
关键词
Syntactically-constrained code generation; Turducken-style code; Multi-task learning; CodeT5; Abstract syntax tree;
D O I
10.1007/s10664-023-10372-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code will not always adhere to syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize three significant challenges in regards to syntactic constraints: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.
引用
收藏
页数:35
相关论文
共 50 条
  • [21] A Multi-task Learning Approach for Image Captioning
    Zhao, Wei
    Wang, Benyou
    Ye, Jianbo
    Yang, Min
    Zhao, Zhou
    Luo, Ruotian
    Qiao, Yu
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1205 - 1211
  • [22] CoTexT: Multi-task Learning with Code-Text Transformer
    Long Phan
    Hieu Tran
    Le, Daniel
    Hieu Nguyen
    Anibal, James
    Peltekian, Alec
    Ye, Yanfang
    NLP4PROG 2021: THE 1ST WORKSHOP ON NATURAL LANGUAGE PROCESSING FOR PROGRAMMING (NLP4PROG 2021), 2021, : 40 - 47
  • [23] Who Speaks Like a Style of Vitamin: Towards Syntax-Aware Dialogue Summarization Using Multi-Task Learning
    Lee, Seolhwa
    Yang, Kisu
    Park, Chanjun
    Sedoc, Joao
    Lim, Heuiseok
    IEEE ACCESS, 2021, 9 : 168889 - 168898
  • [24] A Simple Approach to Balance Task Loss in Multi-Task Learning
    Liang, Sicong
    Deng, Chang
    Zhang, Yu
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 812 - 823
  • [25] Multi-Task Learning with Language Modeling for Question Generation
    Zhou, Wenjie
    Zhang, Minghua
    Wu, Yunfang
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3394 - 3399
  • [26] Binaural Audio Generation via Multi-task Learning
    Li, Sijia
    Liu, Shiguang
    Manocha, Dinesh
    ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (06):
  • [27] Pallet Recognition with Multi-Task Learning for Automated Guided Vehicles
    Mok, Chunghyup
    Baek, Insung
    Cho, Yoon Sang
    Kim, Younghoon
    Kim, Seoung Bum
    APPLIED SCIENCES-BASEL, 2021, 11 (24):
  • [28] A COMBINED APPROACH TO MULTI-LABEL MULTI-TASK LEARNING
    Motamedvaziri, D.
    Saligrama, V.
    Castanon, D.
    2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 616 - 619
  • [29] Heatmap-guided balanced multi-task learning approach for glistening characterization in OCT images
    alvarez-Rodriguez, Lorena
    de Moura, Joaquim
    Fernandez-Vigo, Jose Ignacio
    Macarro-Merino, Ana
    Fernandez-Vigo, Jose angel
    Novo, Jorge
    Ortega, Marcos
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 104
  • [30] Enhancing Infrared Small Target Detection: A Saliency-Guided Multi-Task Learning Approach
    Liu, Zhaoying
    Zhang, Yuxiang
    He, Junran
    Zhang, Ting
    Rehman, Sadaqat Ur
    Saraee, Mohamad
    Sun, Changming
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (03) : 3603 - 3618