Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model

被引:1
|
作者
Pan Na [1 ]
Jiang Min [1 ]
Kong Jun [1 ]
机构
[1] Jiangnan Univ, Jiangsu Prov Engn Lab Pattern Recognit & Computat, Wuxi 214122, Jiangsu, Peoples R China
关键词
machine vision; action recognition; two-stream network; attention; deep learning; interaction; SPATIAL-TEMPORAL ATTENTION; VIDEO;
D O I
10.3788/LOP57.181506
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A human action recognition algorithm is proposed based on spatio-temporal interactive attention model (STIAM) to solve the problem of low recognition accuracy. This problem is caused by the incapability of the two-stream network to effectively extract the valid frames in each video and the valid regions in each frame. Initially, the proposed algorithm applies two different deep learning networks to extract spatial and temporal features respectively. Subsequently, a mask-guided spatial attention model is designed to calculate the salient regions in each frame. Then, an optical flow-guided temporal attention model is designed to locate the saliency frames in each video. Finally, the weights obtained from temporal and spatial attention arc weighted respectively with spatial features and temporal features to make this model realize the spatio-temporal interaction. Compared with the existing methods on UCF101 and Penn Action datasets, the experimental results show that STIAM has high feature extraction performance and the accuracy of action recognition is obviously improved.
引用
收藏
页数:9
相关论文
共 27 条
  • [1] [Anonymous], 2017, P IEEE INT C COMPUTE
  • [2] Cao C Q, 2016, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, P3324
  • [3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [4] Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
    Du, Wenbin
    Wang, Yali
    Qiao, Yu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1347 - 1360
  • [5] Inferring Shared Attention in Social Scene Videos
    Fan, Lifeng
    Chen, Yixin
    Wei, Ping
    Wang, Wenguan
    Zhu, Song-Chun
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6460 - 6468
  • [6] End-to-End Learning of Motion Representation for Video Understanding
    Fan, Lijie
    Huang, Wenbing
    Gan, Chuang
    Ermon, Stefano
    Gong, Boqing
    Huang, Junzhou
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6016 - 6025
  • [7] Dual Attention Network for Scene Segmentation
    Fu, Jun
    Liu, Jing
    Tian, Haijie
    Li, Yong
    Bao, Yongjun
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149
  • [8] Im2Flow: Motion Hallucination from Static Images for Action Recognition
    Gao, Ruohan
    Xiong, Bo
    Grauman, Kristen
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5937 - 5947
  • [9] Video you only look once: Overall temporal convolutions for action recognition
    Jing, Longlong
    Yang, Xiaodong
    Tian, Yingli
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 52 : 58 - 65
  • [10] Khowaja S A, 2019, NEURAL COMPUTING APP, P1