AMA: attention-based multi-feature aggregation module for action recognition

被引:1
|
作者
Yu, Mengyun [1 ]
Chen, Ying [1 ]
机构
[1] Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214000, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Channel excitation; Spatial-temporal aggregation; Convolution neural network; FRAMEWORK;
D O I
10.1007/s11760-022-02268-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Spatial information learning, temporal modeling and channel relationships capturing are important for action recognition in videos. In this work, an attention-based multi-feature aggregation (AMA) module that encodes the above features in a unified module is proposed, which contains a spatial-temporal aggregation (STA) structure and a channel excitation (CE) structure. STA mainly employs two convolutions to model spatial and temporal features, respectively. The matrix multiplication in STA has the ability of capturing long-range dependencies. The CE learns the importance of each channel, so as to bias the allocation of available resources toward the informative features. AMA module is simple yet efficient enough that can be inserted into a standard ResNet architecture without any modification. In this way, the representation of the network can be enhanced. We equip ResNet-50 with AMA module to build an effective AMA Net with limited extra computation cost, only 1.002 times that of ResNet-50. Extensive experiments indicate that AMA Net outperforms the state-of-the-art methods on UCF101 and HMDB51, which is 6.2% and 10.0% higher than the baseline. In short, AMA Net achieves the high accuracy of 3D convolutional neural networks and maintains the complexity of 2D convolutional neural networks simultaneously.
引用
收藏
页码:619 / 626
页数:8
相关论文
共 50 条
  • [31] Multi-Feature Based Emotion Recognition for Video Clips
    Liu, Chuanhe
    Tang, Tianhao
    Lv, Kui
    Wang, Minghao
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 630 - 634
  • [32] Semantic attention-based heterogeneous feature aggregation network for image fusion
    Ruan, Zhiqiang
    Wan, Jie
    Xiao, Guobao
    Tang, Zhimin
    Ma, Jiayi
    PATTERN RECOGNITION, 2024, 155
  • [33] Beauty Product Image Retrieval Based on Multi-Feature Fusion and Feature Aggregation
    Wang, Qi
    Lai, Jingxiang
    Xu, Kai
    Liu, Wenyin
    Lei, Liang
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2063 - 2067
  • [34] Multi-feature fusion gaze estimation based on attention mechanism
    Hu, Zhangfang
    Xia, Yanling
    Luo, Yuan
    Wang, Lan
    OPTOELECTRONIC IMAGING AND MULTIMEDIA TECHNOLOGY VIII, 2021, 11897
  • [35] Dehazing algorithm based on mixed attention and multi-feature interaction
    Yang, Yan
    Zhang, Quanjun
    Liang, Haobo
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2024, 56 (09): : 56 - 64
  • [36] Human action recognition based on multi-feature fusion and hierarchical BP-AdaBoost algorithm
    Wu, Z. (zhenyang@seu.edu.cn), 1600, Southeast University (44):
  • [37] ATSN: Attention-Based Temporal Segment Network for Action Recognition
    Sun, Yun-lei
    Zhang, Da-lin
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2019, 26 (06): : 1664 - 1669
  • [38] Gait recognition via weighted global-local feature fusion and attention-based multiscale temporal aggregation
    Xu, Yingqi
    Xi, Hao
    Ren, Kai
    Zhu, Qiyuan
    Hu, Chuanping
    JOURNAL OF ELECTRONIC IMAGING, 2025, 34 (01)
  • [39] Recurrent Temporal Sparse Autoencoder for Attention-based Action Recognition
    Xin, Miao
    Zhang, Hong
    Sun, Mingui
    Yuan, Ding
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 456 - 463
  • [40] Continuous Estimation of Human Joint Angles From sEMG Using a Multi-Feature Temporal Convolutional Attention-Based Network
    Wang, Shurun
    Tang, Hao
    Gao, Lifu
    Tan, Qi
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (11) : 5461 - 5472