A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities

被引:21
|
作者
Wang, Huogen [1 ,2 ]
Song, Zhanjie [3 ]
Li, Wanqing [2 ]
Wang, Pichao [4 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Univ Wollongong, Adv Multimedia Res Lab, Wollongong, NSW 2522, Australia
[3] Tianjin Univ, Sch Math, Tianjin 300350, Peoples R China
[4] Alibaba Grp US Inc, Bellevue, WA 98004 USA
基金
中国国家自然科学基金;
关键词
action recognition; weighted rank pooling; weighted dynamic image; 3D convolutional LSTM network; canonical correlation analysis;
D O I
10.3390/s20113305
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The paper presents a novel hybrid network for large-scale action recognition from multiple modalities. The network is built upon the proposed weighted dynamic images. It effectively leverages the strengths of the emerging Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches to specifically address the challenges that occur in large-scale action recognition and are not fully dealt with by the state-of-the-art methods. Specifically, the proposed hybrid network consists of a CNN based component and an RNN based component. Features extracted by the two components are fused through canonical correlation analysis and then fed to a linear Support Vector Machine (SVM) for classification. The proposed network achieved state-of-the-art results on the ChaLearn LAP IsoGD, NTU RGB+D and Multi-modal & Multi-view & Interactive ((MI)-I-2) datasets and outperformed existing methods by a large margin (over 10 percentage points in some cases).
引用
收藏
页码:1 / 25
页数:25
相关论文
共 50 条
  • [41] Hybrid Deep Learning Ensemble Model for Improved Large-Scale Car Recognition
    Verma, Abhishek
    Liu, Yu
    2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [42] Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Streams
    Tur, Anil Osman
    Keles, Hacer Yalim
    PROCEEDINGS OF 18TH INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES (IEEE EUROCON 2019), 2019,
  • [43] Large-Scale Visual Font Recognition
    Chen, Guang
    Yang, Jianchao
    Jin, Hailin
    Brandt, Jonathan
    Shechtman, Eli
    Agarwala, Aseem
    Han, Tony X.
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3598 - 3605
  • [44] Large-Scale Visual Speech Recognition
    Shillingford, Brendan
    Assael, Yannis
    Hoffman, Matthew W.
    Paine, Thomas
    Hughes, Cian
    Prabhu, Utsav
    Liao, Hank
    Sak, Hasim
    Rao, Kanishka
    Bennett, Lorrayne
    Mulville, Marie
    Denil, Misha
    Coppin, Ben
    Laurie, Ben
    Senior, Andrew
    de Freitas, Nando
    INTERSPEECH 2019, 2019, : 4135 - 4139
  • [45] On the preconditions for large-scale collective action
    Jagers, Sverker C.
    Harring, Niklas
    Lofgren, Asa
    Sjostedt, Martin
    Alpizar, Francisco
    Brulde, Bengt
    Langlet, David
    Nilsson, Andreas
    Almroth, Bethanie Carney
    Dupont, Sam
    Steffen, Will
    AMBIO, 2020, 49 (07) : 1282 - 1296
  • [46] On the preconditions for large-scale collective action
    Sverker C. Jagers
    Niklas Harring
    Åsa Löfgren
    Martin Sjöstedt
    Francisco Alpizar
    Bengt Brülde
    David Langlet
    Andreas Nilsson
    Bethanie Carney Almroth
    Sam Dupont
    Will Steffen
    Ambio, 2020, 49 : 1282 - 1296
  • [47] Large-scale Monocular Depth Estimation in the Wild
    Haji-Esmaeili, Mohammad M.
    Montazer, Gholamali
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [48] Describing Trajectory of Surface Patch for Human Action Recognition on RGB and Depth Videos
    Song, Yan
    Liu, Shi
    Tang, Jinhui
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (04) : 426 - 429
  • [49] LPHD: A LARGE-SCALE HEAD POSE DATASET FOR RGB IMAGES
    Sun, Wei
    Fan, Yezhao
    Min, Xiongkuo
    Peng, Shihao
    Ma, Siwei
    Zhai, Guangtao
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1084 - 1089
  • [50] Local and Global Feature Descriptors Combination from RGB-Depth Videos for Human Action Recognition
    Al-Akam, Rawya
    Paulus, Dietrich
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM 2018), 2018, : 265 - 272