The Multimodal Scene Recognition Method Based on Self-Attention and Distillation

被引:0
|
作者
Sun, Ning [1 ]
Xu, Wei [1 ]
Liu, Jixin [1 ]
Chai, Lei [1 ]
Sun, Haian [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Nanjing 210003, Peoples R China
关键词
Feature extraction; Training; Image recognition; Transformers; Layout; Convolutional neural networks; Sun; NETWORK;
D O I
10.1109/MMUL.2024.3415643
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scene recognition is a challenging task in computer vision because of the diversity of objects in scene images and the ambiguity of object layouts. In recent years, the emergence of multimodal scene data has provided new solutions for scene recognition, but it has also brought new problems. To address these challenges, the self-attention and distillation-based multimodal scene recognition network (SAD-MSR) is proposed in this article. The backbone of the model adopts the pure transformer structure of self-attention, which can extract local and global spatial features of multimodal scene images. A multistage fusion mechanism was developed for this model in which the concatenated tokens of two modalities are fused based on self-attention in the early stage, while the high-level features extracted from the two modalities are fused based on cross attention in the late stage. Furthermore, a distillation mechanism is introduced to alleviate the problem of a limited number of training samples. Finally, we conducted extensive experiments on two multimodal scene recognition databases, SUN RGB-D and NYU Depth, to show the effectiveness of SAD-MSR. Compared with other state-of-the-art multimodal scene recognition methods, our method can achieve better experimental results.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [41] Self-Attention Encoding and Pooling for Speaker Recognition
    Safari, Pooyan
    India, Miquel
    Hernando, Javier
    INTERSPEECH 2020, 2020, : 941 - 945
  • [42] Cyclic Self-attention for Point Cloud Recognition
    Zhu, Guanyu
    Zhou, Yong
    Yao, Rui
    Zhu, Hancheng
    Zhao, Jiaqi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [43] Facial Action Unit Recognition Based on Self-Attention Spatiotemporal Fusion
    Liang, Chaolei
    Zou, Wei
    Hu, Danfeng
    Wang, JiaJun
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 600 - 605
  • [44] An efficient self-attention network for skeleton-based action recognition
    Xiaofei Qin
    Rui Cai
    Jiabin Yu
    Changxiang He
    Xuedian Zhang
    Scientific Reports, 12 (1)
  • [45] SELF-ATTENTION BASED DARKNET NAMED ENTITY RECOGNITION WITH BERT METHODS
    Chen, Yuxuan
    Guo, Yubin
    Jiang, Hong
    Ding, Jianwei
    Chen, Zhouguo
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2021, 17 (06): : 1973 - 1988
  • [46] MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION
    Wang, Rui
    Ao, Junyi
    Zhou, Long
    Liu, Shujie
    Wei, Zhihua
    Ko, Tom
    Li, Qing
    Zhang, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6732 - 6736
  • [47] Self-Attention Network for Skeleton-based Human Action Recognition
    Cho, Sangwoo
    Maqbool, Muhammad Hasan
    Liu, Fei
    Foroosh, Hassan
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 624 - 633
  • [48] Compact Cloud Detection with Bidirectional Self-Attention Knowledge Distillation
    Chai, Yajie
    Fu, Kun
    Sun, Xian
    Diao, Wenhui
    Yan, Zhiyuan
    Feng, Yingchao
    Wang, Lei
    REMOTE SENSING, 2020, 12 (17)
  • [49] Human Activity Recognition Based on Self-Attention Mechanism in WiFi Environment
    Ge, Fei
    Yang, Zhimin
    Dai, Zhenyang
    Tan, Liansheng
    Hu, Jianyuan
    Li, Jiayuan
    Qiu, Han
    IEEE ACCESS, 2024, 12 : 85231 - 85243
  • [50] An efficient self-attention network for skeleton-based action recognition
    Qin, Xiaofei
    Cai, Rui
    Yu, Jiabin
    He, Changxiang
    Zhang, Xuedian
    SCIENTIFIC REPORTS, 2022, 12 (01):