The Multimodal Scene Recognition Method Based on Self-Attention and Distillation

被引:0
|
作者
Sun, Ning [1 ]
Xu, Wei [1 ]
Liu, Jixin [1 ]
Chai, Lei [1 ]
Sun, Haian [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Nanjing 210003, Peoples R China
关键词
Feature extraction; Training; Image recognition; Transformers; Layout; Convolutional neural networks; Sun; NETWORK;
D O I
10.1109/MMUL.2024.3415643
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scene recognition is a challenging task in computer vision because of the diversity of objects in scene images and the ambiguity of object layouts. In recent years, the emergence of multimodal scene data has provided new solutions for scene recognition, but it has also brought new problems. To address these challenges, the self-attention and distillation-based multimodal scene recognition network (SAD-MSR) is proposed in this article. The backbone of the model adopts the pure transformer structure of self-attention, which can extract local and global spatial features of multimodal scene images. A multistage fusion mechanism was developed for this model in which the concatenated tokens of two modalities are fused based on self-attention in the early stage, while the high-level features extracted from the two modalities are fused based on cross attention in the late stage. Furthermore, a distillation mechanism is introduced to alleviate the problem of a limited number of training samples. Finally, we conducted extensive experiments on two multimodal scene recognition databases, SUN RGB-D and NYU Depth, to show the effectiveness of SAD-MSR. Compared with other state-of-the-art multimodal scene recognition methods, our method can achieve better experimental results.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [31] Long-Tailed Recognition Based on Self-attention Mechanism
    Feng, Zekai
    Jia, Hong
    Li, Mengke
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14876 : 380 - 391
  • [32] A Self-attention Based Model for Offline Handwritten Text Recognition
    Nam Tuan Ly
    Trung Tan Ngo
    Nakagawa, Masaki
    PATTERN RECOGNITION, ACPR 2021, PT II, 2022, 13189 : 356 - 369
  • [33] An adaptive multi-head self-attention coupled with attention filtered LSTM for advanced scene text recognition
    Selvam, Prabu
    Kumar, S. N.
    Kannadhasan, S.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2025,
  • [34] Double Attention: An Optimization Method for the Self-Attention Mechanism Based on Human Attention
    Zhang, Zeyu
    Li, Bin
    Yan, Chenyang
    Furuichi, Kengo
    Todo, Yuki
    BIOMIMETICS, 2025, 10 (01)
  • [35] A Static Sign Language Recognition Method Enhanced with Self-Attention Mechanisms
    Wang, Yongxin
    Jiang, He
    Sun, Yutong
    Xu, Longqi
    SENSORS, 2024, 24 (21)
  • [36] Spatial self-attention network with self-attention distillation for fine-grained image recognitionx2729;
    Baffour, Adu Asare
    Qin, Zhen
    Wang, Yong
    Qin, Zhiguang
    Choo, Kim-Kwang Raymond
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 81
  • [37] Remote Sensing Image Scene Classification Based on Global Self-Attention Module
    Li, Qingwen
    Yan, Dongmei
    Wu, Wanrong
    REMOTE SENSING, 2021, 13 (22)
  • [38] Efficient Semantic Segmentation via Self-Attention and Self-Distillation
    An, Shumin
    Liao, Qingmin
    Lu, Zongqing
    Xue, Jing-Hao
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (09) : 15256 - 15266
  • [39] SFusion: Self-attention Based N-to-One Multimodal Fusion Block
    Liu, Zecheng
    Wei, Jia
    Li, Rui
    Zhou, Jianlong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 159 - 169
  • [40] Multimodal Depression Detection Based on Self-Attention Network With Facial Expression and Pupil
    Liu, Xiang
    Shen, Hao
    Li, Huiru
    Tao, Yongfeng
    Yang, Minqiang
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2025, 12 (01): : 64 - 76