Research Progress of Code Naturalness and Its Application

被引:0
|
作者
Chen Z.-Z. [1 ,2 ]
Yan M. [1 ,2 ]
Xia X. [3 ]
Liu Z.-X. [4 ]
Xu Z. [1 ,2 ]
Lei Y. [1 ,2 ]
机构
[1] Key Laboratory of Dependable Service Computing in Cyber Physical Society, Chongqing University, Ministry of Education, Chongqing
[2] School of Big Data and Software Engineering, Chongqing University, Chongqing
[3] Faculty of Information Technology, Monash University, Melbourne, 3800, VIC
[4] College of Computer Science and Technology, Zhejiang University, Hangzhou
来源
Ruan Jian Xue Bao/Journal of Software | 2022年 / 33卷 / 08期
关键词
code language model; code naturalness; mining software repositories;
D O I
10.13328/j.cnki.jos.006355
中图分类号
学科分类号
摘要
The study of code naturalness is one of the common research hotspots in the field of natural language processing and software engineering, aiming to solve various software engineering tasks by building a code naturalness model based on natural language processing techniques. In recent years, as the size of source code and data in the open source software community continues to grow, more and more researchers are focusing on the information contained in the source code, and a series of research results have been achieved. While at the same time, code naturalness research faces many challenges in code corpus construction, model building, and task application. In view of this, this paper reviews and summarizes the progress of code naturalness research and application in recent years in terms of code corpus construction, model construction, and task application. The main contents include: (1) Introducing the basic concept of code naturalness and its research overview; (2) The current corpus of code naturalness research is summarized, and the modeling methods for code naturalness are classified and summarized; (3) Summarizing the experimental validation methods and model evaluation metrics of code naturalness models; (4) Summarizing and categorizing the current application status of code naturalness; (5) Summarizing the key issues of code naturalness techniques; (6) Prospecting the future development of code naturalness techniques. © 2022 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:3015 / 3034
页数:19
相关论文
共 90 条
  • [1] Hindle A, Barr ET, Su Z, Et al., On the naturalness of software, Proc. of the 34th Int’l Conf. on Software Engineering (ICSE), pp. 837-847, (2012)
  • [2] Hirschberg J, Manning CD., Advances in natural language processing, Science, 349, 6245, (2015)
  • [3] Cambria E, White B., Jumping NLP curves: A review of natural language processing research, IEEE Computational Intelligence Magazine, 9, 2, (2014)
  • [4] Khurana D, Koli A, Khatter K, Et al., Natural language processing: State of the art, current trends and challenges, (2017)
  • [5] Sharma A, Tian Y, Lo D., NIRMAL: Automatic identification of software relevant tweets leveraging language model, Proc. of the 22nd IEEE Int’l Conf. on Software Analysis, Evolution, and Reengineering (SANER), pp. 449-458, (2015)
  • [6] Gabel M, Su Z., A study of the uniqueness of source code, Proc. of the 18th ACM SIGSOFT Int’l Symp. on Foundations of Software Engineering (FSE 2010), pp. 147-156, (2010)
  • [7] Casalnuovo C, Lee K, Wang H, Et al., Do people prefer “natural” code?, (2019)
  • [8] Tu Z, Su Z, Devanbu P., On the localness of software, Proc. of the 22nd ACM SIGSOFT Int’l Symp. on Foundations of Software Engineering, pp. 269-280, (2014)
  • [9] Yang Y, Jiang Y, Gu M, Et al., A language model for statements of software code, Proc. of the 32nd IEEE/ ACM Int’l Conf. on Automated Software Engineering (ASE), pp. 682-687, (2017)
  • [10] Allamanis M, Tarlow D, Gordon AD, Et al., Bimodal modelling of source code and natural language, Proc. of the 32nd Int’l Conf. on Machine Learning, 37, pp. 2123-2132, (2015)