Comparing and experimenting machine learning techniques for code smell detection

被引:257
|
作者
Fontana, Francesca Arcelli [1 ]
Mantyla, Mika V. [4 ,5 ]
Zanoni, Marco [2 ]
Marino, Alessandro [3 ]
机构
[1] Univ Milano Bicocca, Dept Comp Sci, Milan, Italy
[2] Univ Milano Bicocca, Dept Informat Syst & Commun, Milan, Italy
[3] Univ Milano Bicocca, Milan, Italy
[4] Univ Oulu, Software Engn, Oulu, Finland
[5] Aalto Univ, Helsinki, Finland
关键词
Code smells detection; Machine learning techniques; Benchmark for code smell detection; BAD SMELLS; QUALITY; CLASSIFICATION;
D O I
10.1007/s10664-015-9378-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.
引用
收藏
页码:1143 / 1191
页数:49
相关论文
共 50 条
  • [21] A Review on Machine-learning Based Code Smell Detection Techniques in Object-oriented Software System(s)
    Kaur, Amandeep
    Jain, Sushma
    Goel, Shivani
    Dhiman, Gaurav
    RECENT ADVANCES IN ELECTRICAL & ELECTRONIC ENGINEERING, 2021, 14 (03) : 290 - 303
  • [22] Deep Learning Based Code Smell Detection
    Liu, Hui
    Jin, Jiahao
    Xu, Zhifeng
    Zou, Yanzhen
    Bu, Yifan
    Zhang, Lu
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (09) : 1811 - 1837
  • [23] A Semisupervised Learning Approach for Code Smell Detection
    Ishita Kheria
    Dhruv Gada
    Ruhina Karani
    SN Computer Science, 6 (2)
  • [24] Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review
    Ahmed Al-Shaaby
    Hamoud Aljamaan
    Mohammad Alshayeb
    Arabian Journal for Science and Engineering, 2020, 45 : 2341 - 2369
  • [25] Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review
    Al-Shaaby, Ahmed
    Aljamaan, Hamoud
    Alshayeb, Mohammad
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 2341 - 2369
  • [26] Improving Code Smell Detection by Reducing Dimensionality Using Ensemble Feature Selection and Machine Learning
    Nandini A.
    Singh R.
    Rathee A.
    SN Computer Science, 5 (6)
  • [27] On the relative value of imbalanced learning for code smell detection
    Li, Fuyang
    Zou, Kuan
    Keung, Jacky Wai
    Yu, Xiao
    Feng, Shuo
    Xiao, Yan
    SOFTWARE-PRACTICE & EXPERIENCE, 2023, 53 (10): : 1902 - 1927
  • [28] A Support Vector Machine based Approach for Code Smell Detection
    Kaur, Amandeep
    Jain, Sushma
    Goel, Shivani
    2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND DATA SCIENCE (MLDS 2017), 2017, : 9 - 14
  • [29] Machine Learning Techniques for Code Smells Detection: A Systematic Mapping Study
    Caram, Frederico Luiz
    De Oliveira Rodrigues, Bruno Rafael
    Campanelli, Amadeu Silveira
    Parreiras, Fernando Silva
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2019, 29 (02) : 285 - 316
  • [30] Applying machine learning techniques for detection of malicious code in network traffic
    Elovici, Yuval
    Shabtai, Asaf
    Moskovitch, Robert
    Tahan, Gil
    Glezer, Chanan
    KI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4667 : 44 - +