On the Naturalness of Auto-generated Code-Can We Identify Auto-Generated Code Automatically?-

被引:0
|
作者
Doi, Masayuki [1 ]
Higo, Yoshiki [1 ]
Arima, Ryo [1 ]
Shimonaka, Kento [1 ]
Kusumoto, Shinji [1 ]
机构
[1] Osaka Univ, Suita, Osaka, Japan
关键词
Auto-generated code; N-gram language model; Source code analysis;
D O I
10.1145/3196321.3196356
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recently, a variety of studies have been conducted on source code analysis. If auto-generated code is included in the target source code, it is usually removed in a preprocessing phase because the presence of auto-generated code may have negative effects on source code analysis. A straightforward way to remove autogenerated code is searching special comments that are included in the files of auto-generated code. However, it becomes impossible to identify auto-generated code with the way if such special comments have disappeared for some reasons. It is obvious that it takes too much effort to see source files one by one manually. In this paper, we propose a new technique to identify auto-generated code by using the naturalness of auto-generated code. We used a golden set that includes thousands of hand-made source files and source files generated by four kinds of compiler-compilers. Through the evaluation with the dataset, we confirmed that our technique was able to identify auto-generated code with over 99% precision and recall for all the cases.
引用
收藏
页码:340 / 343
页数:4
相关论文
共 50 条
  • [1] Experiments with Auto-generated Socratic Dialogue for Source Code Understanding
    Alshaikh, Zeyad
    Tamang, Lasang
    Rus, Vasile
    CSEDU: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED EDUCATION - VOL 2, 2021, : 35 - 44
  • [2] Identifying Auto-Generated Code by Using Machine Learning Techniques
    Shimonaka, Kento
    Sumi, Soichi
    Higo, Yoshiki
    Kusumoto, Shinji
    PROCEEDINGS 7TH INTERNATIONAL WORKSHOP ON EMPIRICAL SOFTWARE ENGINEERING IN PRACTICE (IWESEP 2016), 2016, : 18 - 23
  • [3] Simulation with consideration of hardware characteristics and auto-generated code using matlab/simulink
    Moon, Tae-Yoon
    Seo, Suk-Hyun
    Kim, Jin-Ho
    Hwang, Sung-Ho
    Jeon, Jae Wook
    2007 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS, VOLS 1-6, 2007, : 336 - +
  • [4] Generating Code Review Documentation for Auto-Generated Mission-Critical Software
    Denney, Ewen
    Fischer, Bernd
    SMC-IT 2009: THIRD IEEE INTERNATIONAL CONFERENCE ON SPACE MISSION CHALLENGES FOR INFORMATION TECHNOLOGY, PROCEEDINGS, 2009, : 394 - +
  • [5] Towards Auto-Generated Data Systems
    Cheung, Alvin
    Ahmad, Maaz Bin Safeer
    Haynes, Brandon
    Kittivorawong, Chanwut
    Laddad, Shadaj
    Liu, Xiaoxuan
    Wang, Chenglong
    Yan, Cong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 4116 - 4129
  • [6] Auto-generated Strokes for Motion Segmentation
    Tian, Zhiqiang
    Xue, Jianru
    Li, Ce
    Lan, Xuguang
    Zheng, Nanning
    2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 857 - 860
  • [7] Verifying Auto-generated C Code from Simulink An Experience Report in the Automotive Domain
    Berger, Philipp
    Katoen, Joost-Pieter
    Abraham, Erika
    Bin Waez, Md Tawhid
    Rambow, Thomas
    FORMAL METHODS, 2018, 10951 : 312 - 328
  • [8] An Evaluation Model for Auto-generated Cognitive Scripts
    ELMougi, Ahmed M.
    Omar, Yasser M. K.
    Hodhod, Rania
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (08) : 333 - 340
  • [9] Human Experts' Perceptions of Auto-Generated Summarization Quality
    Lotfigolian, Maryam
    Papanikolaou, Christos
    Taghizadeh, Samaneh
    Sandnes, Frode Eika
    PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2023, 2023, : 95 - 98
  • [10] Mining Auto-Generated Test Inputs for Test Oracle
    Xu, Weifeng
    Wang, Hanlin
    Ding, Tao
    PROCEEDINGS OF THE 2013 10TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2013, : 89 - 94