PYMT5: multi-mode translation of natural language and PYTHON']PYTHON code with transformers

被引:0
|
作者
Clement, Colin B. [1 ]
Drain, Dawn [1 ]
Timcheck, Jonathan [2 ]
Svyatkovskiy, Alexey [1 ]
Sundaresan, Neel [1 ]
机构
[1] Microsoft Cloud & AI, Redwood City, CA 94063 USA
[2] Stanford Univ, Stanford, CA 94305 USA
来源
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Simultaneously modeling source code and natural language has many exciting applications in automated software development and understanding. Pursuant to achieving such technology, we introduce PYMT5, the PYTHON method text-to-text transfer transformer, which is trained to translate between all pairs of PYTHON method feature combinations: a single model that can both predict whole methods from natural language documentation strings (docstrings) and summarize code into docstrings of any common style. We present an analysis and modeling effort of a large-scale parallel corpus of 26 million PYTHON methods and 7.7 million method-docstring pairs, demonstrating that for docstring and method generation, PYMT5 outperforms similarlysized auto-regressive language models (GPT2) which were English pre-trained or randomly initialized. On the CODE-SEARCHNET test set, our best model predicts 92.1% syntactically correct method bodies, achieved a BLEU score of 8.59 for method generation and 16.3 for docstring generation (summarization), and achieved a ROUGE-L F-score of 24.8 for method generation and 36.7 for docstring generation.
引用
收藏
页码:9052 / 9065
页数:14
相关论文
共 5 条
  • [1] Transformers based Python']Python Code Generation from Natural Language
    Swathi, Smt E.
    Vanga, Abhinav Reddy
    2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
  • [2] Natural Language to Python Source Code using Transformers
    Shah, Meet
    Shenoy, Rajat
    Shankarmani, Radha
    2021 International Conference on Intelligent Technologies, CONIT 2021, 2021,
  • [3] Text2PyCode: Machine Translation of Natural Language Intent to Python']Python Source Code
    Bonthu, Sridevi
    Sree, S. Rama
    Prasad, M. H. M. Krishna
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION (CD-MAKE 2021), 2021, 12844 : 51 - 60
  • [4] Multi-mode Natural Language Processing for Extracting Open Knowledge
    Xie, Jiongkun
    Chen, Xiaoping
    Ji, Jianmin
    Sui, Zhiqiang
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 2, 2012, : 154 - 161
  • [5] Multi-mode Natural Language Processing for human-robot interaction
    Xie, Jiongkun
    Chen, Xiaoping
    Ji, Jianmin
    WEB INTELLIGENCE, 2015, 13 (04) : 267 - 278