WaveNet: Wavelet Network With Knowledge Distillation for RGB-T Salient Object Detection

被引:59
|
作者
Zhou, Wujie [1 ]
Sun, Fan [1 ,2 ]
Jiang, Qiuping [3 ]
Cong, Runmin [4 ]
Hwang, Jenq-Neng [5 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 308232, Singapore
[3] Ningbo Univ, Sch Informat Sci & Engn, Ningbo 315211, Peoples R China
[4] Shandong Univ, Sch Control Sci & Engn, Jinan, Peoples R China
[5] Univ Washington, Dept Elect Engn, Seattle, WA 98105 USA
基金
中国国家自然科学基金;
关键词
Transformers; Feature extraction; Discrete wavelet transforms; Training; Knowledge engineering; Cross layer design; Convolutional neural networks; Wavelet; knowledge distillation; discrete wavelet transform; progressively stretched sine-cosine module; edge-aware module; FUSION; IMAGE;
D O I
10.1109/TIP.2023.3275538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, various neural network architectures for computer vision have been devised, such as the visual transformer and multilayer perceptron (MLP). A transformer based on an attention mechanism can outperform a traditional convolutional neural network. Compared with the convolutional neural network and transformer, the MLP introduces less inductive bias and achieves stronger generalization. In addition, a transformer shows an exponential increase in the inference, training, and debugging times. Considering a wave function representation, we propose the WaveNet architecture that adopts a novel vision task-oriented wavelet-based MLP for feature extraction to perform salient object detection in RGB (red-green-blue)-thermal infrared images. In addition, we apply knowledge distillation to a transformer as an advanced teacher network to acquire rich semantic and geometric information and guide WaveNet learning with this information. Following the shortestpath concept, we adopt the Kullback-Leibler distance as a regularization term for the RGB features to be as similar to the thermal infrared features as possible. The discrete wavelet transform allows for the examination of frequency-domain features in a local time domain and time-domain features in a local frequency domain. We apply this representation ability to perform cross-modality feature fusion. Specifically, we introduce a progressively cascaded sine-cosine module for cross-layer feature fusion and use low-level features to obtain clear boundaries of salient objects through the MLP. Results from extensive experiments indicate that the proposed WaveNet achieves impressive performance on benchmark RGB-thermal infrared datasets. The results and code are publicly available at https://github.com/nowander/WaveNet.
引用
收藏
页码:3027 / 3039
页数:13
相关论文
共 50 条
  • [1] Modal complementary fusion network for RGB-T salient object detection
    Ma, Shuai
    Song, Kechen
    Dong, Hongwen
    Tian, Hongkun
    Yan, Yunhui
    APPLIED INTELLIGENCE, 2023, 53 (08) : 9038 - 9055
  • [2] PSNet: Parallel symmetric network for RGB-T salient object detection
    Bi, Hongbo
    Wu, Ranwan
    Liu, Ziqi
    Zhang, Jiayuan
    Zhang, Cong
    Xiang, Tian-Zhu
    Wang, Xiufang
    NEUROCOMPUTING, 2022, 511 (410-425) : 410 - 425
  • [3] Bidirectional Alternating Fusion Network for RGB-T Salient Object Detection
    Tu, Zhengzheng
    Lin, Danying
    Jiang, Bo
    Gu, Le
    Wang, Kunpeng
    Zhai, Sulan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VIII, 2025, 15038 : 34 - 48
  • [4] Modal complementary fusion network for RGB-T salient object detection
    Shuai Ma
    Kechen Song
    Hongwen Dong
    Hongkun Tian
    Yunhui Yan
    Applied Intelligence, 2023, 53 : 9038 - 9055
  • [5] SIA: RGB-T salient object detection network with salient-illumination awareness
    Song, Kechen
    Wen, Hongwei
    Ji, Yingying
    Xue, Xiaotong
    Huang, Liming
    Yan, Yunhui
    Meng, Qinggang
    OPTICS AND LASERS IN ENGINEERING, 2024, 172
  • [6] Pyramid contract-based network for RGB-T salient object detection
    Ranwan Wu
    Hongbo Bi
    Cong Zhang
    Jiayuan Zhang
    Yuyu Tong
    Wei Jin
    Zhigang Liu
    Multimedia Tools and Applications, 2024, 83 : 20805 - 20825
  • [7] Pyramid contract-based network for RGB-T salient object detection
    Wu, Ranwan
    Bi, Hongbo
    Zhang, Cong
    Zhang, Jiayuan
    Tong, Yuyu
    Jin, Wei
    Liu, Zhigang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 20805 - 20825
  • [8] Weighted Guided Optional Fusion Network for RGB-T Salient Object Detection
    Wang, Jie
    Li, Guoqiang
    Shi, Jie
    Xi, Jinwen
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)
  • [9] Interactive context-aware network for RGB-T salient object detection
    Wang, Yuxuan
    Dong, Feng
    Zhu, Jinchao
    Chen, Jianren
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 72153 - 72174
  • [10] Feature aggregation with transformer for RGB-T salient object detection
    Zhang, Ping
    Xu, Mengnan
    Zhang, Ziyan
    Gao, Pan
    Zhang, Jing
    NEUROCOMPUTING, 2023, 546