To what extent do DNN-based image classification models make unreliable inferences?

被引:15
|
作者
Tian, Yongqiang [1 ]
Ma, Shiqing [2 ]
Wen, Ming [3 ]
Liu, Yepang [4 ]
Cheung, Shing-Chi [1 ]
Zhang, Xiangyu [5 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ USA
[3] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan, Hubei, Peoples R China
[4] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Guangdong, Peoples R China
[5] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
基金
中国国家自然科学基金;
关键词
Deep learning; Metamorphic testing; Software engineering for AI;
D O I
10.1007/s10664-021-09985-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep Neural Network (DNN) models are widely used for image classification. While they offer high performance in terms of accuracy, researchers are concerned about if these models inappropriately make inferences using features irrelevant to the target object in a given image. To address this concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two metamorphic relations (MRs) to detect such unreliable inferences. These relations expect (a) the classification results with different labels or the same labels but less certainty from models after corrupting the relevant features of images, and (b) the classification results with the same labels after corrupting irrelevant features. The inferences that violate the metamorphic relations are regarded as unreliable inferences. Our evaluation demonstrated that our approach can effectively identify unreliable inferences for single-label classification models with an average precision of 64.1% and 96.4% for the two MRs, respectively. As for multi-label classification models, the corresponding precision for MR-1 and MR-2 is 78.2% and 86.5%, respectively. Further, we conducted an empirical study to understand the problem of unreliable inferences in practice. Specifically, we applied our approach to 18 pre-trained single-label image classification models and 3 multi-label classification models, and then examined their inferences on the ImageNet and COCO datasets. We found that unreliable inferences are pervasive. Specifically, for each model, more than thousands of correct classifications are actually made using irrelevant features. Next, we investigated the effect of such pervasive unreliable inferences, and found that they can cause significant degradation of a model's overall accuracy. After including these unreliable inferences from the test set, the model's accuracy can be significantly changed. Therefore, we recommend that developers should pay more attention to these unreliable inferences during the model evaluations. We also explored the correlation between model accuracy and the size of unreliable inferences. We found the inferences of the input with smaller objects are easier to be unreliable. Lastly, we found that the current model training methodologies can guide the models to learn object-relevant features to certain extent, but may not necessarily prevent the model from making unreliable inferences. We encourage the community to propose more effective training methodologies to address this issue.
引用
收藏
页数:40
相关论文
共 50 条
  • [21] DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION
    Ghorbani, Shahram
    Khorram, Soheil
    Hansen, John H. L.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 107 - 113
  • [22] Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition
    Abdelaziz, Ahmed Hussen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 475 - 484
  • [23] An efficient XGBoost–DNN-based classification model for network intrusion detection system
    Preethi Devan
    Neelu Khare
    Neural Computing and Applications, 2020, 32 : 12499 - 12514
  • [24] DNN-Based Brain MRI Classification Using Fuzzy Clustering and Autoencoder Features
    Chauhan, Nishant
    Choi, Byung-Jae
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2021, 21 (04) : 349 - 357
  • [25] Efficient DNN-Based Classification of Whole Slide Gram Stain Images for Microbiology
    Alhammad, Sarah
    Zhao, Kun
    Jennings, Anthony
    Hobson, Peter
    Smith, Daniel F.
    Baker, Brett
    Staweno, Justin
    Lovell, Brian C.
    2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 98 - 105
  • [26] REIN the RobuTS: Robust DNN-Based Image Recognition in Autonomous Driving Systems
    Yu, Fuxun
    Qin, Zhuwei
    Liu, Chenchen
    Wang, Di
    Chen, Xiang
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2021, 40 (06) : 1258 - 1271
  • [27] Traffic Reduction in Video Call and Chat using DNN-based Image Reconstruction
    Watanabe, Shota
    Fujihashi, Takuya
    Saruwatari, Shunsuke
    Watanabe, Takashi
    ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,
  • [28] Towards an Efficient Accelerator for DNN-based Remote Sensing Image Segmentation on FPGAs
    Liu, Shuanglong
    Luk, Wayne
    2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 187 - 193
  • [29] A New Framework for Integrating DNN-Based Geographic Simulation Models within GISystems
    Zhang, Peng
    Wu, Wenzhou
    Xue, Cunjin
    Shi, Shaochen
    Su, Fenzhen
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (10)
  • [30] Enhancement of DNN-based multilabel classification by grouping labels based on data imbalance and label correlation
    Chen, Ling
    Wang, Yuhong
    Li, Hao
    PATTERN RECOGNITION, 2022, 132