AMU-Net: an accurate motion perception and aggregation model for action recognition

被引:0
|
作者
Zhang, Haitao [1 ]
Xia, Ying [1 ]
Feng, Jiangfan [1 ]
机构
[1] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
video action recognition; two-dimensional convolutional networks; temporal difference; motion noise; action graph; spatial sparsity of motion objects;
D O I
10.1117/1.JEI.33.2.023053
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Motion information plays a vital role in video action recognition, serving as the fundamental building basis for the accurate interpretation of dynamic sequences. However, extracting accurate motion details remains a significant challenge for two-dimensional (2D) CNNs. To address this issue, we present an action recognition framework, named accurate motion understanding network (AMU-Net), designed to effectively perceive and aggregate valuable motion cues. Specifically, AMU-Net is a 2D CNN equipped with the proposed accurate motion perceptron (AMP) and action graph module (AGM). To capture finer local motion details, the AMP is introduced to handle motion noise in temporal differences. This module enables the extraction of critical local motion patterns from bidirectional temporal differences and enhances action-related features. Furthermore, to learn more precise global motion representations, the AGM is introduced to address the spatial sparsity of motion objects by detecting motion objects and selectively aggregating their features using a graph reasoning framework. Extensive experiments are conducted on three public benchmarks: ActivityNet-200, UCF-101, and Kinetics-400. Experimental results demonstrate that the proposed AMU-Net (based on ResNet-50) outperforms recent 2D CNN-based methods with a comparable computational overhead. In addition, the experimental results also show the effective transferability of the two modules to three popular lightweight convolutional architectures, emphasizing their versatility. (c) 2024 SPIE and IS&T
引用
收藏
页数:19
相关论文
共 28 条
  • [1] MIAM: Motion information aggregation module for action recognition
    Cheng, Qin
    Ren, Ziliang
    Liu, Zhen
    Cheng, Jun
    Zhang, Qieshi
    Liu, Jianming
    ELECTRONICS LETTERS, 2022, 58 (10) : 396 - 398
  • [2] A fast and accurate motion descriptor for human action recognition applications
    Ghorbel, Enjie
    Boutteau, Remi
    Bonnaert, Jacques
    Savatier, Xavier
    Lecoeuche, Stephane
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 919 - 924
  • [3] M2A: Motion Aware Attention for Accurate Video Action Recognition
    Gebotys, Brennan
    Wong, Alexander
    Clausi, David A.
    2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 83 - 89
  • [4] Feature Aggregation Tree: Capture Temporal Motion Information for Action Recognition in Videos
    Zhu, Bing
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 316 - 327
  • [5] Human action recognition method based on Motion Excitation and Temporal Aggregation module
    Ye, Qing
    Tan, Zexian
    Zhang, Yongmei
    HELIYON, 2022, 8 (11)
  • [6] Impaired biological motion perception and action recognition in children with autism spectrum disorder
    Wang, Liang Hui
    Chen, Tzu-Yun
    Chen, Hsin-Shui
    Chien, Sarina Hui-Lin
    I-PERCEPTION, 2014, 5 (04): : 359 - 359
  • [7] Stme-net: spatio-temporal motion excitation network for action recognition
    Zhao, Qian
    Su, Yanxiong
    Zhang, Hui
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2025, 22 (02)
  • [8] Video Action Recognition Using Motion and Multi-View Excitation with Temporal Aggregation
    Joefrie, Yuri Yudhaswana
    Aono, Masaki
    ENTROPY, 2022, 24 (11)
  • [9] Learning action recognition and implied motion-A neural model
    Layher, G.
    Neumann, H.
    PERCEPTION, 2011, 40 : 176 - 177
  • [10] LM-Net: a dynamic gesture recognition network with long-term aggregation and motion excitation
    Chang, Shaopeng
    Huang, Xueyu
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (04) : 1633 - 1645