Joint spatial-temporal attention for action recognition

被引:25
|
作者
Yu, Tingzhao [1 ,2 ]
Guo, Chaoxu [1 ,2 ]
Wang, Lingfeng [1 ]
Gu, Huxiang [1 ]
Xiang, Shiming [1 ]
Pan, Chunhong [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Spatial-Temporal attention; Two-Stage; REPRESENTATION;
D O I
10.1016/j.patrec.2018.07.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel high-level action representation using joint spatial-temporal attention model, with application to video-based human action recognition. Specifically, to extract robust motion representations of videos, a new spatial attention module based on 3D convolution is proposed, which can pay attention to the salient parts of the spatial areas. For better dealing with long-duration videos, a new bidirectional LSTM based temporal attention module is introduced, which aims to focus on the key video cubes instead of the key video frames of a given video. The spatial-temporal attention network can be jointly trained via a two-stage strategy, which enables us to simultaneously explore the correlation both in spatial and temporal domain. Experimental results on benchmark action recognition datasets demonstrate the effectiveness of our network. (c) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:226 / 233
页数:8
相关论文
共 50 条
  • [31] Spatial-Temporal Pyramid Graph Reasoning for Action Recognition
    Geng, Tiantian
    Zheng, Feng
    Hou, Xiaorong
    Lu, Ke
    Qi, Guo-Jun
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5484 - 5497
  • [32] Action recognition with spatial-temporal discriminative filter banks
    Martinez, Brais
    Modolo, Davide
    Xiong, Yuanjun
    Tighe, Joseph
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5481 - 5490
  • [33] Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
    Luo, Chenxu
    Yuille, Alan
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5511 - 5520
  • [34] Spatial-Temporal Interleaved Network for Efficient Action Recognition
    Jiang, Shengqin
    Zhang, Haokui
    Qi, Yuankai
    Liu, Qingshan
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (01) : 178 - 187
  • [35] 3D-STARNET: Spatial-Temporal Attention Residual Network for Robust Action Recognition
    Yang, Jun
    Sun, Shulong
    Chen, Jiayue
    Xie, Haizhen
    Wang, Yan
    Yang, Zenglong
    APPLIED SCIENCES-BASEL, 2024, 14 (16):
  • [36] Multiple Distilling-based spatial-temporal attention networks for unsupervised human action recognition
    Zhang, Cheng
    Zhong, Jianqi
    Cao, Wenming
    Ji, Jianhua
    INTELLIGENT DATA ANALYSIS, 2024, 28 (04) : 921 - 941
  • [37] Skeleton-based attention-aware spatial-temporal model for action detection and recognition
    Cui, Ran
    Zhu, Aichun
    Wu, Jingran
    Hua, Gang
    IET COMPUTER VISION, 2020, 14 (05) : 177 - 184
  • [38] Robust Human Action Recognition Using Global Spatial-Temporal Attention for Human Skeleton Data
    Han, Yun
    Chung, Sheng-Luen
    Ambikapathi, ArulMurugan
    Chan, Jui-Shan
    Lin, Wei-You
    Su, Shun-Feng
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [39] Spatial-Temporal Bottom-Up Top-Down Attention Model for Action Recognition
    Wang, Jinpeng
    Ma, Andy J.
    IMAGE AND GRAPHICS, ICIG 2019, PT I, 2019, 11901 : 81 - 92
  • [40] Convolution spatial-temporal attention network for EEG emotion recognition
    Cao, Lei
    Yu, Binlong
    Dong, Yilin
    Liu, Tianyu
    Li, Jie
    PHYSIOLOGICAL MEASUREMENT, 2024, 45 (12)