The Multimodal Scene Recognition Method Based on Self-Attention and Distillation

被引:0
|
作者
Sun, Ning [1 ]
Xu, Wei [1 ]
Liu, Jixin [1 ]
Chai, Lei [1 ]
Sun, Haian [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Nanjing 210003, Peoples R China
关键词
Feature extraction; Training; Image recognition; Transformers; Layout; Convolutional neural networks; Sun; NETWORK;
D O I
10.1109/MMUL.2024.3415643
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scene recognition is a challenging task in computer vision because of the diversity of objects in scene images and the ambiguity of object layouts. In recent years, the emergence of multimodal scene data has provided new solutions for scene recognition, but it has also brought new problems. To address these challenges, the self-attention and distillation-based multimodal scene recognition network (SAD-MSR) is proposed in this article. The backbone of the model adopts the pure transformer structure of self-attention, which can extract local and global spatial features of multimodal scene images. A multistage fusion mechanism was developed for this model in which the concatenated tokens of two modalities are fused based on self-attention in the early stage, while the high-level features extracted from the two modalities are fused based on cross attention in the late stage. Furthermore, a distillation mechanism is introduced to alleviate the problem of a limited number of training samples. Finally, we conducted extensive experiments on two multimodal scene recognition databases, SUN RGB-D and NYU Depth, to show the effectiveness of SAD-MSR. Compared with other state-of-the-art multimodal scene recognition methods, our method can achieve better experimental results.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [21] Recognition of piglet postures based on self-attention mechanism and anchor-free method
    Xu C.
    Xue Y.
    Zheng C.
    Hou W.
    Guo J.
    Wang X.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (14): : 166 - 173
  • [22] An Intelligent Point Cloud Recognition Method for Substation Equipment Based on Multiscale Self-Attention
    Shen, Xiaojun
    Xu, Zelin
    Wang, Mei
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [23] A Fast Convolutional Self-attention Based Speech Dereverberation Method for Robust Speech Recognition
    Li, Nan
    Ge, Meng
    Wang, Longbiao
    Dang, Jianwu
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 295 - 305
  • [24] An Intelligent Point Cloud Recognition Method for Substation Equipment Based on Multiscale Self-Attention
    Shen, Xiaojun
    Xu, Zelin
    Wang, Mei
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [25] Self-attention for Speech Emotion Recognition
    Tarantino, Lorenzo
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2019, 2019, : 2578 - 2582
  • [26] Advancing classroom fatigue recognition: A multimodal fusion approach using self-attention mechanism
    Cao, Lei
    Wang, Wenrong
    Dong, Yilin
    Fan, Chunjiang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 89
  • [27] An Aerial Target Recognition Algorithm Based on Self-Attention and LSTM
    Liang, Futai
    Chen, Xin
    He, Song
    Song, Zihao
    Lu, Hao
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (01): : 1101 - 1121
  • [28] Pedestrian Attribute Recognition Based on Dual Self-attention Mechanism
    Fan, Zhongkui
    Guan, Ye-peng
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (02) : 793 - 812
  • [29] Self-Attention based Siamese Neural Network recognition Model
    Liu, Yuxing
    Chang, Geng
    Fu, Guofeng
    Wei, Yingchao
    Lan, Jie
    Liu, Jiarui
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 721 - 724
  • [30] Lightweight Smoke Recognition Based on Deep Convolution and Self-Attention
    Zhao, Yang
    Wang, Yigang
    Jung, Hoi-Kyung
    Jin, Yongqiang
    Hua, Dan
    Xu, Sen
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022