Joint spatial-temporal attention for action recognition

被引:25
|
作者
Yu, Tingzhao [1 ,2 ]
Guo, Chaoxu [1 ,2 ]
Wang, Lingfeng [1 ]
Gu, Huxiang [1 ]
Xiang, Shiming [1 ]
Pan, Chunhong [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Spatial-Temporal attention; Two-Stage; REPRESENTATION;
D O I
10.1016/j.patrec.2018.07.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel high-level action representation using joint spatial-temporal attention model, with application to video-based human action recognition. Specifically, to extract robust motion representations of videos, a new spatial attention module based on 3D convolution is proposed, which can pay attention to the salient parts of the spatial areas. For better dealing with long-duration videos, a new bidirectional LSTM based temporal attention module is introduced, which aims to focus on the key video cubes instead of the key video frames of a given video. The spatial-temporal attention network can be jointly trained via a two-stage strategy, which enables us to simultaneously explore the correlation both in spatial and temporal domain. Experimental results on benchmark action recognition datasets demonstrate the effectiveness of our network. (c) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:226 / 233
页数:8
相关论文
共 50 条
  • [21] Spatial-temporal pooling for action recognition in videos
    Wang, Jiaming
    Shao, Zhenfeng
    Huang, Xiao
    Lu, Tao
    Zhang, Ruiqian
    Lv, Xianwei
    NEUROCOMPUTING, 2021, 451 : 265 - 278
  • [22] Spatial-temporal interaction module for action recognition
    Luo, Hui-Lan
    Chen, Han
    Cheung, Yiu-Ming
    Yu, Yawei
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [23] Spatial-Temporal gated graph attention network for skeleton-based action recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 929 - 939
  • [24] Attention-based spatial-temporal hierarchical ConvLSTM network for action recognition in videos
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    IET COMPUTER VISION, 2019, 13 (08) : 708 - 718
  • [25] An Attention Enhanced Spatial-Temporal Graph Convolutional LSTM Network for Action Recognition in Karate
    Guo, Jianping
    Liu, Hong
    Li, Xi
    Xu, Dahong
    Zhang, Yihan
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [26] Extreme Low-Resolution Action Recognition with Confident Spatial-Temporal Attention Transfer
    Yucai Bai
    Qin Zou
    Xieyuanli Chen
    Lingxi Li
    Zhengming Ding
    Long Chen
    International Journal of Computer Vision, 2023, 131 : 1550 - 1565
  • [27] Extreme Low-Resolution Action Recognition with Confident Spatial-Temporal Attention Transfer
    Bai, Yucai
    Zou, Qin
    Chen, Xieyuanli
    Li, Lingxi
    Ding, Zhengming
    Chen, Long
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (06) : 1550 - 1565
  • [28] Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    Saba, Tanzila
    Rehman, Amjad
    Bahaj, Saeed Ali
    IEEE ACCESS, 2023, 11 : 21546 - 21553
  • [29] Streamer action recognition in live video with spatial-temporal attention and deep dictionary learning
    Li, Chenhao
    Zhang, Jing
    Yao, Jiacheng
    NEUROCOMPUTING, 2021, 453 : 383 - 392
  • [30] Activity Recognition Based on Spatial-Temporal Attention LSTM
    Xie, Zhao
    Zhou, Yi
    Wu, Ke-Wei
    Zhang, Shun-Ran
    Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (02): : 261 - 274