Gradient-Based Instance-Specific Visual Explanations for Object Specification and Object Discrimination

被引:2
|
作者
Zhao, Chenyang [1 ]
Hsiao, Janet H. [2 ]
Chan, Antoni B. [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Div Social Sci, Hong Kong, Peoples R China
关键词
Detectors; Visualization; Heat maps; Task analysis; Object detection; Predictive models; Transformers; Deep learning; explainable AI; explaining object detection; gradient-based explanation; human eye gaze; instance-level explanation; knowledge distillation; non-maximum suppression; object discrimination; object specification; NMS;
D O I
10.1109/TPAMI.2024.3380604
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visual explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works on classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to one-stage, two-stage, and transformer-based detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art in terms of both effectiveness and efficiency. We discuss two explanation tasks for object detection: 1) object specification: what is the important region for the prediction? 2) object discrimination: which object is detected? Aiming at these two aspects, we present a detailed analysis of the visual explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM. Furthermore, we investigate user trust on the explanation maps, how well the visual explanations of object detectors agrees with human explanations, as measured through human eye gaze, and whether this agreement is related with user trust. Finally, we also propose two applications, ODAM-KD and ODAM-NMS, based on these two abilities of ODAM. ODAM-KD utilizes the object specification of ODAM to generate top-down attention for key predictions and instruct the knowledge distillation of object detection. ODAM-NMS considers the location of the model's explanation for each prediction to distinguish the duplicate detected objects. A training scheme, ODAM-Train, is proposed to improve the quality on object discrimination, and help with ODAM-NMS.
引用
收藏
页码:5967 / 5985
页数:19
相关论文
共 50 条
  • [21] Gradient-Based Saliency Maps Are Not Trustworthy Visual Explanations of Automated AI Musculoskeletal Diagnoses
    Venkatesh, Kesavan
    Mutasa, Simukayi
    Moore, Fletcher
    Sulam, Jeremias
    Yi, Paul H.
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024, 37 (05): : 2490 - 2499
  • [22] A gradient-based foreground detection technique for object tracking in a traffic monitoring system
    Kiratiratanapruk, K
    Dubey, P
    Siddhichai, S
    AVSS 2005: ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, PROCEEDINGS, 2005, : 377 - 381
  • [23] Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection
    Li, Jiaming
    Lin, Xiangru
    Zhang, Wei
    Tan, Xiao
    Li, Yingying
    Han, Junyu
    Ding, Errui
    Wang, Jingdong
    Li, Guanbin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16344 - 16354
  • [24] Magnetic Tensor Sensor for Gradient-Based Localization of Ferrous Object in Geomagnetic Field
    Lee, Kok-Meng
    Li, Min
    IEEE TRANSACTIONS ON MAGNETICS, 2016, 52 (08)
  • [25] Affine Invariant Visual Phrases for Object Instance Recognition
    Patraucean, Viorica
    Ovsjanikov, Maks
    2015 14TH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2015, : 14 - 17
  • [26] Pose Induction for Visual Servoing to a Novel Object Instance
    Kumar, Gourav
    Pandya, Harit
    Gaud, Ayush
    Krishna, K. Madhava
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 2953 - 2959
  • [27] Visual dictionary and online multi-instance learning based object tracking
    School of Information and Electronics, Beijing Institute of Technology, Beijing
    100081, China
    Xi Tong Cheng Yu Dian Zi Ji Shu/Syst Eng Electron, 2 (428-435):
  • [28] CIMask: Segmenting instances by class-specific semantic feature extraction and instance-specific attribute discrimination
    Xiang, Canqun
    Zou, Wenbin
    Xu, Chen
    NEUROCOMPUTING, 2021, 464 : 164 - 174
  • [29] Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
    Selvaraju, Ramprasaath R.
    Cogswell, Michael
    Das, Abhishek
    Vedantam, Ramakrishna
    Parikh, Devi
    Batra, Dhruv
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (02) : 336 - 359
  • [30] Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
    Ramprasaath R. Selvaraju
    Michael Cogswell
    Abhishek Das
    Ramakrishna Vedantam
    Devi Parikh
    Dhruv Batra
    International Journal of Computer Vision, 2020, 128 : 336 - 359