MgMViT: Multi-Granularity and Multi-Scale Vision Transformer for Efficient Action Recognition

被引:1
|
作者
Huo, Hua [1 ]
Li, Bingjie [1 ]
机构
[1] Henan Univ Sci & Technol, Informat Engn Coll, Luoyang 471000, Peoples R China
基金
中国国家自然科学基金;
关键词
action recognition; multi-granularity multi-scale fusion; vision transformer; efficiency;
D O I
10.3390/electronics13050948
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, the field of video-based action recognition is rapidly developing. Although Vision Transformers (ViT) have made great progress in static image processing, they are not yet fully optimized for dynamic video applications. Convolutional Neural Networks (CNN) and related models perform exceptionally well in video action recognition. However, there are still some issues that cannot be ignored, such as high computational costs and large memory consumption. In the face of these issues, current research focuses on finding effective methods to improve model performance and overcome current limits. Therefore, we present a unique Vision Transformer model based on multi-granularity and multi-scale fusion to accomplish efficient action recognition, which is designed for action recognition in videos to effectively reduce computational costs and memory usage. Firstly, we devise a multi-scale, multi-granularity module that integrates with Transformer blocks. Secondly, a hierarchical structure is utilized to manage information at various scales, and we introduce multi-granularity on top of multi-scale, which allows for a selective choice of the number of tokens to enter the next computational step, thereby reducing redundant tokens. Thirdly, a coarse-fine granularity fusion layer is introduced to reduce the sequence length of tokens with lower information content. The above two mechanisms are combined to optimize the allocation of resources in the model, further emphasizing critical information and reducing redundancy, thereby minimizing computational costs. To assess our proposed approach, comprehensive experiments are conducted by using benchmark datasets in the action recognition domain. The experimental results demonstrate that our method has achieved state-of-the-art performance in terms of accuracy and efficiency.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] MMFL-net: multi-scale and multi-granularity feature learning for cross-domain fashion retrieval
    Chen Bao
    Xudong Zhang
    Jiazhou Chen
    Yongwei Miao
    Multimedia Tools and Applications, 2023, 82 : 37905 - 37937
  • [32] An efficient multi-scale transformer for satellite image dehazing
    Yang, Lei
    Cao, Jianzhong
    Chen, Weining
    Wang, Hao
    He, Lang
    EXPERT SYSTEMS, 2024, 41 (08)
  • [33] Efficient multi-scale texture recognition algorithm
    He, Fa-Zhi (fzhe@whu.edu.cn), 1600, Chinese Academy of Sciences (25):
  • [34] MMFL-net: multi-scale and multi-granularity feature learning for cross-domain fashion retrieval
    Bao, Chen
    Zhang, Xudong
    Chen, Jiazhou
    Miao, Yongwei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 82 (24) : 37905 - 37937
  • [35] Grouped multi-scale vision transformer for medical image segmentation
    Zexuan Ji
    Zheng Chen
    Xiao Ma
    Scientific Reports, 15 (1)
  • [36] CFI-Former: Efficient lane detection by multi-granularity perceptual query attention transformer
    Gao, Rong
    Hu, Siqi
    Yan, Lingyu
    Zhang, Lefei
    Wu, Jia
    NEURAL NETWORKS, 2025, 187
  • [37] MGT: Multi-Granularity Transformer leveraging multi-level relation for sequential recommendation
    Zhang, Yihu
    Yang, Bo
    Mao, Runze
    Li, Qing
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [38] MALT: Multi-scale Action Learning Transformer for Online Action Detection
    Xie, Liping (lpxie@seu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc.
  • [39] Multi-Scale Deformable Transformer for Banknote Serial Number Recognition
    Zhang, Kaisheng
    Li, Xuyang
    Computer Engineering and Applications, 2023, 59 (18) : 105 - 118
  • [40] Multi-scale window transformer for cervical cytopathology image recognition
    Yi, Jiaxiang
    Liu, Xiuli
    Cheng, Shenghua
    Chen, Li
    Zeng, Shaoqun
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 24 : 314 - 321