Aries: Efficient Testing of Deep Neural Networks via Labeling-Free Accuracy Estimation

被引:7
|
作者
Hu, Qiang [1 ]
Guo, Yuejun [2 ]
Xie, Xiaofei [3 ]
Cordy, Maxime [1 ]
Papadakis, Mike [1 ]
Ma, Lei [4 ,5 ]
Le Traon, Yves [1 ]
机构
[1] Univ Luxembourg, Luxembourg, Luxembourg
[2] Luxembourg Inst Sci & Technol, Luxembourg, Luxembourg
[3] Singapore Management Univ, Singapore, Singapore
[4] Univ Alberta, Edmonton, AB, Canada
[5] Univ Tokyo, Tokyo, Japan
来源
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE | 2023年
基金
加拿大自然科学与工程研究理事会;
关键词
deep learning testing; performance estimation; distribution shift;
D O I
10.1109/ICSE48619.2023.00152
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep learning (DL) plays a more and more important role in our daily life due to its competitive performance in industrial application domains. As the core of DL-enabled systems, deep neural networks (DNNs) need to be carefully evaluated to ensure the produced models match the expected requirements. In practice, the de facto standard to assess the quality of DNNs in the industry is to check their performance (accuracy) on a collected set of labeled test data. However, preparing such labeled data is often not easy partly because of the huge labeling effort, i.e., data labeling is labor-intensive, especially with the massive new incoming unlabeled data every day. Recent studies show that test selection for DNN is a promising direction that tackles this issue by selecting minimal representative data to label and using these data to assess the model. However, it still requires human effort and cannot be automatic. In this paper, we propose a novel technique, named Aries, that can estimate the performance of DNNs on new unlabeled data using only the information obtained from the original test data. The key insight behind our technique is that the model should have similar prediction accuracy on the data which have similar distances to the decision boundary. We performed a large-scale evaluation of our technique on two famous datasets, CIFAR-10 and Tiny-ImageNet, four widely studied DNN models including ResNet101 and DenseNet-121, and 13 types of data transformation methods. Results show that the estimated accuracy by Aries is only 0.03% - 2.60% off the true accuracy. Besides, Aries also outperforms the state-of-the-art labeling-free methods in 50 out of 52 cases and selection-labeling-based methods in 96 out of 128 cases.
引用
收藏
页码:1776 / 1787
页数:12
相关论文
共 50 条
  • [31] DeepConcolic: Testing and Debugging Deep Neural Networks
    Sun, Youcheng
    Huang, Xiaowei
    Kroening, Daniel
    Sharp, James
    Hill, Matthew
    Ashmore, Rob
    2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2019), 2019, : 111 - 114
  • [32] Testing for Multiple Faults in Deep Neural Networks
    Moussa, Dina A.
    Hefenbrock, Michael
    Tahoori, Mehdi
    IEEE DESIGN & TEST, 2024, 41 (03) : 47 - 53
  • [33] Seed Selection for Testing Deep Neural Networks
    Zhi, Yuhan
    Xie, Xiaofei
    Shen, Chao
    Sun, Jun
    Zhang, Xiaoyu
    Guan, Xiaohong
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (01)
  • [34] Outdoor Scene Labeling Using Deep Convolutional Neural Networks
    Wen Jun
    Zhong Chaolliang
    Liu Shirong
    Wang Jian
    2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 3953 - 3958
  • [35] Dense Image Labeling Using Deep Convolutional Neural Networks
    Islam, Md Amirul
    Bruce, Neil
    Wang, Yang
    2016 13TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV), 2016, : 16 - 23
  • [36] Deep and Wide Neural Networks Covariance Estimation
    Arratia, Argimiro
    Cabana, Alejandra
    Rafael Leon, Jose
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 195 - 206
  • [37] Deep Neural Networks for Propensity Score Estimation
    Guzman-Alvarez, Alberto
    Qin, Xu
    Scott, Paul W.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2022, 57 (01) : 164 - 165
  • [38] Estimation of ground motion parameters via multi-task deep neural networks
    Meng, Fanchun
    Ren, Tao
    Guo, Enming
    Chen, Hongfeng
    Liu, Xinliang
    Zhang, Haodong
    Li, Jiang
    NATURAL HAZARDS, 2024, 120 (07) : 6737 - 6754
  • [39] Estimation of ground motion parameters via multi-task deep neural networks
    Fanchun Meng
    Tao Ren
    Enming Guo
    Hongfeng Chen
    Xinliang Liu
    Haodong Zhang
    Jiang Li
    Natural Hazards, 2024, 120 : 6737 - 6754
  • [40] An Efficient Accelerator for Deep Convolutional Neural Networks
    Kuo, Yi-Xian
    Lai, Yeong-Kang
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,