To what extent do DNN-based image classification models make unreliable inferences?

被引:15
|
作者
Tian, Yongqiang [1 ]
Ma, Shiqing [2 ]
Wen, Ming [3 ]
Liu, Yepang [4 ]
Cheung, Shing-Chi [1 ]
Zhang, Xiangyu [5 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ USA
[3] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan, Hubei, Peoples R China
[4] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Guangdong, Peoples R China
[5] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
基金
中国国家自然科学基金;
关键词
Deep learning; Metamorphic testing; Software engineering for AI;
D O I
10.1007/s10664-021-09985-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep Neural Network (DNN) models are widely used for image classification. While they offer high performance in terms of accuracy, researchers are concerned about if these models inappropriately make inferences using features irrelevant to the target object in a given image. To address this concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two metamorphic relations (MRs) to detect such unreliable inferences. These relations expect (a) the classification results with different labels or the same labels but less certainty from models after corrupting the relevant features of images, and (b) the classification results with the same labels after corrupting irrelevant features. The inferences that violate the metamorphic relations are regarded as unreliable inferences. Our evaluation demonstrated that our approach can effectively identify unreliable inferences for single-label classification models with an average precision of 64.1% and 96.4% for the two MRs, respectively. As for multi-label classification models, the corresponding precision for MR-1 and MR-2 is 78.2% and 86.5%, respectively. Further, we conducted an empirical study to understand the problem of unreliable inferences in practice. Specifically, we applied our approach to 18 pre-trained single-label image classification models and 3 multi-label classification models, and then examined their inferences on the ImageNet and COCO datasets. We found that unreliable inferences are pervasive. Specifically, for each model, more than thousands of correct classifications are actually made using irrelevant features. Next, we investigated the effect of such pervasive unreliable inferences, and found that they can cause significant degradation of a model's overall accuracy. After including these unreliable inferences from the test set, the model's accuracy can be significantly changed. Therefore, we recommend that developers should pay more attention to these unreliable inferences during the model evaluations. We also explored the correlation between model accuracy and the size of unreliable inferences. We found the inferences of the input with smaller objects are easier to be unreliable. Lastly, we found that the current model training methodologies can guide the models to learn object-relevant features to certain extent, but may not necessarily prevent the model from making unreliable inferences. We encourage the community to propose more effective training methodologies to address this issue.
引用
收藏
页数:40
相关论文
共 50 条
  • [1] To what extent do DNN-based image classification models make unreliable inferences?
    Yongqiang Tian
    Shiqing Ma
    Ming Wen
    Yepang Liu
    Shing-Chi Cheung
    Xiangyu Zhang
    Empirical Software Engineering, 2021, 26
  • [2] Efficient Image Sensor Subsampling for DNN-Based Image Classification
    Guo, Jia
    Gu, Hongxiang
    Potkonjak, Miodrag
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED '18), 2018, : 225 - 230
  • [3] DNN-Based PolSAR Image Classification on Noisy Labels
    Ni, Jun
    Xiang, Deliang
    Lin, Zhiyuan
    Lopez-Martinez, Carlos
    Hu, Wei
    Zhang, Fan
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 3697 - 3713
  • [4] DNN-based Image Classification for Software GUI Testing
    Lu, Huijian
    Wang, Li
    Ye, Minchao
    Yan, Ke
    Jin, Qun
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 1818 - 1823
  • [5] DNN-based Models for Speaker Age and Gender Classification
    Qawaqneh, Zakariya
    Abu Mallouh, Arafat
    Barkana, Buket D.
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS, 2017, : 106 - 111
  • [6] ToonNet: A cartoon image dataset and a DNN-based semantic classification system
    Zhou, Yanqing
    Jin, Yongxu
    Luo, Anqi
    Chan, Szeyu
    Xiao, Xiangyun
    Yang, Xubo
    PROCEEDINGS OF THE 16TH ACM SIGGRAPH INTERNATIONAL CONFERENCE ON VIRTUAL-REALITY CONTINUUM AND ITS APPLICATIONS IN INDUSTRY (VRCAI 2018), 2018,
  • [7] A COMPARATIVE STUDY OF DNN-BASED MODELS FOR BLIND IMAGE QUALITY PREDICTION
    Yang, Xiaohan
    Li, Fan
    Liu, Hantao
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1019 - 1023
  • [8] Fake Gradient: A Security and Privacy Protection Framework for DNN-based Image Classification
    Feng, Xianglong
    Xie, Yi
    Ye, Mengmei
    Tang, Zhongze
    Yuan, Bo
    Wei, Sheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5510 - 5518
  • [9] DNN-based Arabic Printed Characters Classification
    Amrouche, Aissa
    PROGRAM OF THE 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND AUTOMATIC CONTROL, ICEEAC 2024, 2024,
  • [10] Attacking DNN-based Intrusion Detection Models
    Zhang, Xingwei
    Zheng, Xiaolong
    Wu, Desheng Dash
    IFAC PAPERSONLINE, 2020, 53 (05): : 415 - 419