To what extent do DNN-based image classification models make unreliable inferences?

被引：15

作者：

Tian, Yongqiang ^{[1
]}

Ma, Shiqing ^{[2
]}

Wen, Ming ^{[3
]}

Liu, Yepang ^{[4
]}

Cheung, Shing-Chi ^{[1
]}

Zhang, Xiangyu ^{[5
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ USA

[3] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan, Hubei, Peoples R China

[4] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Guangdong, Peoples R China

[5] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA

来源：

EMPIRICAL SOFTWARE ENGINEERING | 2021年 / 26卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Deep learning; Metamorphic testing; Software engineering for AI;

D O I：

10.1007/s10664-021-09985-1

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Deep Neural Network (DNN) models are widely used for image classification. While they offer high performance in terms of accuracy, researchers are concerned about if these models inappropriately make inferences using features irrelevant to the target object in a given image. To address this concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two metamorphic relations (MRs) to detect such unreliable inferences. These relations expect (a) the classification results with different labels or the same labels but less certainty from models after corrupting the relevant features of images, and (b) the classification results with the same labels after corrupting irrelevant features. The inferences that violate the metamorphic relations are regarded as unreliable inferences. Our evaluation demonstrated that our approach can effectively identify unreliable inferences for single-label classification models with an average precision of 64.1% and 96.4% for the two MRs, respectively. As for multi-label classification models, the corresponding precision for MR-1 and MR-2 is 78.2% and 86.5%, respectively. Further, we conducted an empirical study to understand the problem of unreliable inferences in practice. Specifically, we applied our approach to 18 pre-trained single-label image classification models and 3 multi-label classification models, and then examined their inferences on the ImageNet and COCO datasets. We found that unreliable inferences are pervasive. Specifically, for each model, more than thousands of correct classifications are actually made using irrelevant features. Next, we investigated the effect of such pervasive unreliable inferences, and found that they can cause significant degradation of a model's overall accuracy. After including these unreliable inferences from the test set, the model's accuracy can be significantly changed. Therefore, we recommend that developers should pay more attention to these unreliable inferences during the model evaluations. We also explored the correlation between model accuracy and the size of unreliable inferences. We found the inferences of the input with smaller objects are easier to be unreliable. Lastly, we found that the current model training methodologies can guide the models to learn object-relevant features to certain extent, but may not necessarily prevent the model from making unreliable inferences. We encourage the community to propose more effective training methodologies to address this issue.

引用

页数：40

共 50 条

[21] DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION
Ghorbani, Shahram
Khorram, Soheil
Hansen, John H. L.
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 107 - 113
[22] Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition
Abdelaziz, Ahmed Hussen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 475 - 484
[23] An efficient XGBoost–DNN-based classification model for network intrusion detection system
Preethi Devan
Neelu Khare
Neural Computing and Applications, 2020, 32 : 12499 - 12514
[24] DNN-Based Brain MRI Classification Using Fuzzy Clustering and Autoencoder Features
Chauhan, Nishant
Choi, Byung-Jae
INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2021, 21 (04) : 349 - 357
[25] Efficient DNN-Based Classification of Whole Slide Gram Stain Images for Microbiology
Alhammad, Sarah
Zhao, Kun
Jennings, Anthony
Hobson, Peter
Smith, Daniel F.
Baker, Brett
Staweno, Justin
Lovell, Brian C.
2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 98 - 105
[26] REIN the RobuTS: Robust DNN-Based Image Recognition in Autonomous Driving Systems
Yu, Fuxun
Qin, Zhuwei
Liu, Chenchen
Wang, Di
Chen, Xiang
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2021, 40 (06) : 1258 - 1271
[27] Traffic Reduction in Video Call and Chat using DNN-based Image Reconstruction
Watanabe, Shota
Fujihashi, Takuya
Saruwatari, Shunsuke
Watanabe, Takashi
ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,
[28] Towards an Efficient Accelerator for DNN-based Remote Sensing Image Segmentation on FPGAs
Liu, Shuanglong
Luk, Wayne
2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 187 - 193
[29] A New Framework for Integrating DNN-Based Geographic Simulation Models within GISystems
Zhang, Peng
Wu, Wenzhou
Xue, Cunjin
Shi, Shaochen
Su, Fenzhen
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (10)
[30] Enhancement of DNN-based multilabel classification by grouping labels based on data imbalance and label correlation
Chen, Ling
Wang, Yuhong
Li, Hao
PATTERN RECOGNITION, 2022, 132

← 1 2 3 4 5 →