To what extent do DNN-based image classification models make unreliable inferences?

被引:15
|
作者
Tian, Yongqiang [1 ]
Ma, Shiqing [2 ]
Wen, Ming [3 ]
Liu, Yepang [4 ]
Cheung, Shing-Chi [1 ]
Zhang, Xiangyu [5 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ USA
[3] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan, Hubei, Peoples R China
[4] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Guangdong, Peoples R China
[5] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
基金
中国国家自然科学基金;
关键词
Deep learning; Metamorphic testing; Software engineering for AI;
D O I
10.1007/s10664-021-09985-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep Neural Network (DNN) models are widely used for image classification. While they offer high performance in terms of accuracy, researchers are concerned about if these models inappropriately make inferences using features irrelevant to the target object in a given image. To address this concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two metamorphic relations (MRs) to detect such unreliable inferences. These relations expect (a) the classification results with different labels or the same labels but less certainty from models after corrupting the relevant features of images, and (b) the classification results with the same labels after corrupting irrelevant features. The inferences that violate the metamorphic relations are regarded as unreliable inferences. Our evaluation demonstrated that our approach can effectively identify unreliable inferences for single-label classification models with an average precision of 64.1% and 96.4% for the two MRs, respectively. As for multi-label classification models, the corresponding precision for MR-1 and MR-2 is 78.2% and 86.5%, respectively. Further, we conducted an empirical study to understand the problem of unreliable inferences in practice. Specifically, we applied our approach to 18 pre-trained single-label image classification models and 3 multi-label classification models, and then examined their inferences on the ImageNet and COCO datasets. We found that unreliable inferences are pervasive. Specifically, for each model, more than thousands of correct classifications are actually made using irrelevant features. Next, we investigated the effect of such pervasive unreliable inferences, and found that they can cause significant degradation of a model's overall accuracy. After including these unreliable inferences from the test set, the model's accuracy can be significantly changed. Therefore, we recommend that developers should pay more attention to these unreliable inferences during the model evaluations. We also explored the correlation between model accuracy and the size of unreliable inferences. We found the inferences of the input with smaller objects are easier to be unreliable. Lastly, we found that the current model training methodologies can guide the models to learn object-relevant features to certain extent, but may not necessarily prevent the model from making unreliable inferences. We encourage the community to propose more effective training methodologies to address this issue.
引用
收藏
页数:40
相关论文
共 50 条
  • [41] T-Miner : A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification
    Azizi, Ahmadreza
    Tahmid, Ibrahim Asadullah
    Waheed, Asim
    Mangaokar, Neal
    Pu, Jiameng
    Javed, Mobin
    Reddy, Chandan K.
    Viswanath, Bimal
    PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 2255 - 2272
  • [42] DNN-Based Peak Sequence Classification CFAR Detection Algorithm for High-Resolution FMCW Radar
    Cao, Zhihui
    Fang, Wenwei
    Song, Yuying
    He, Lai
    Song, Chunyi
    Xu, Zhiwei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [43] Proposal of an Improved Loss Function Considering Image-edge Structure for DNN-based Video Prediction
    Nishimura, Hiroki
    Sekiguchi, Shunichi
    Kameyama, Wataru
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY, IWAIT 2024, 2024, 13164
  • [44] DNN-Based Peak Sequence Classification CFAR Detection Algorithm for High-Resolution FMCW Radar
    Cao, Zhihui
    Fang, Wenwei
    Song, Yuying
    He, Lai
    Song, Chunyi
    Xu, Zhiwei
    IEEE Transactions on Geoscience and Remote Sensing, 2022, 60
  • [45] Performance-Aware Energy-Efficient GPU Frequency Selection using DNN-based Models
    Ali, Ghazanfar
    Side, Mert
    Bhalachandra, Sridutt
    Wright, Nicholas J.
    Chen, Yong
    PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 433 - 442
  • [46] DIMINISHING DOMAIN MISMATCH FOR DNN-BASED ACOUSTIC DISTANCE ESTIMATION VIA STOCHASTIC ROOM REVERBERATION MODELS
    Gburrek, Tobias
    Meise, Adrian
    Schmalenstroeer, Joerg
    Haeb-Limbach, Reinhold
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 279 - 283
  • [47] Black-Box Universal Adversarial Attack for DNN-Based Models of SAR Automatic Target Recognition
    Wan, Xuanshen
    Liu, Wei
    Niu, Chaoyang
    Lu, Wanjie
    Du, Meng
    Li, Yuanli
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 8673 - 8696
  • [48] Use of a DNN-Based Image Translator with Edge Enhancement Technique to Estimate Correspondence between SAR and Optical Images
    Toriya, Hisatoshi
    Dewan, Ashraf
    Ikeda, Hajime
    Owada, Narihiro
    Saadat, Mahdi
    Inagaki, Fumiaki
    Kawamura, Youhei
    Kitahara, Itaru
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [49] Hierarchical DNN-Based Image Segmentation Algorithm Using Texton, Superpixels, and Layer-Adaptive Loss Functions
    Yu, Cheng-Hsuan
    Ding, Jian-Jiun
    2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 135 - 139
  • [50] Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system
    Juvela, Lauri
    Bollepalli, Bajibabu
    Yamagishi, Junichi
    Alku, Paavo
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1368 - 1372