AMA: attention-based multi-feature aggregation module for action recognition

被引:1
|
作者
Yu, Mengyun [1 ]
Chen, Ying [1 ]
机构
[1] Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214000, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Channel excitation; Spatial-temporal aggregation; Convolution neural network; FRAMEWORK;
D O I
10.1007/s11760-022-02268-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Spatial information learning, temporal modeling and channel relationships capturing are important for action recognition in videos. In this work, an attention-based multi-feature aggregation (AMA) module that encodes the above features in a unified module is proposed, which contains a spatial-temporal aggregation (STA) structure and a channel excitation (CE) structure. STA mainly employs two convolutions to model spatial and temporal features, respectively. The matrix multiplication in STA has the ability of capturing long-range dependencies. The CE learns the importance of each channel, so as to bias the allocation of available resources toward the informative features. AMA module is simple yet efficient enough that can be inserted into a standard ResNet architecture without any modification. In this way, the representation of the network can be enhanced. We equip ResNet-50 with AMA module to build an effective AMA Net with limited extra computation cost, only 1.002 times that of ResNet-50. Extensive experiments indicate that AMA Net outperforms the state-of-the-art methods on UCF101 and HMDB51, which is 6.2% and 10.0% higher than the baseline. In short, AMA Net achieves the high accuracy of 3D convolutional neural networks and maintains the complexity of 2D convolutional neural networks simultaneously.
引用
收藏
页码:619 / 626
页数:8
相关论文
共 50 条
  • [11] Reverse Attention-Based Multi-Feature Interaction Network for Finger Vein Image Quality Evaluation
    Chi, Yunhao
    Yang, Lu
    Hao, Fanchang
    Liu, Haiying
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1054 - 1058
  • [12] Attention-based Multi-level Feature Fusion for Named Entity Recognition
    Yang, Zhiwei
    Chen, Hechang
    Zhang, Jiawei
    Ma, Jing
    Chang, Yi
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3594 - 3600
  • [13] Locality Feature Aggregation Loss and Multi-feature Fusion for Facial Expression Recognition
    Wang H.
    Li Y.
    Fang B.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (03): : 268 - 276
  • [14] Few-Shot Action Recognition in Video Based on Multi-Feature Fusion
    Pu Z.-X.
    Ge Y.
    Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (03): : 594 - 608
  • [15] Multi-task image restoration network based on spatial aggregation attention and multi-feature fusion
    Peng, Chunyan
    Zhao, Xueya
    Chen, Yangbo
    Zhang, Wanqing
    Zheng, Yuhui
    IET IMAGE PROCESSING, 2024, 18 (14) : 4563 - 4576
  • [16] MAMask: Multi-feature aggregation instance segmentation with pyramid attention mechanism
    Wang, Gaihua
    Lin, Jinheng
    Zhai, Qianyu
    Cheng, Lei
    Dai, Yingying
    Zhang, Tianlun
    IET IMAGE PROCESSING, 2022, 16 (05) : 1341 - 1348
  • [17] Multi-Feature Gesture Recognition Based on Kinect
    Zhao, Yue
    Liu, Yunda
    Dong, Min
    Si, Sheng
    2016 IEEE INTERNATIONAL CONFERENCE ON CYBER TECHNOLOGY IN AUTOMATION, CONTROL, AND INTELLIGENT SYSTEMS (CYBER), 2016, : 392 - 396
  • [18] Palmprint Recognition Based On Multi-feature Integration
    Zhang Yaxin
    Liu Huanhuan
    Geng Xuefei
    Liu Lili
    PROCEEDINGS OF 2016 IEEE ADVANCED INFORMATION MANAGEMENT, COMMUNICATES, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IMCEC 2016), 2016, : 992 - 995
  • [19] Attention-based interactive multi-level feature fusion for named entity recognition
    Xu, Yiwu
    Chen, Yun
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [20] Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition
    Chiba, Yuya
    Nose, Takashi
    Ito, Akinori
    INTERSPEECH 2020, 2020, : 3301 - 3305