Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization

被引:0
|
作者
Yang, Yuheng [1 ]
Chen, Haipeng [1 ]
Liu, Zhenguang [2 ]
Lyu, Yingda [3 ]
Zhang, Beibei [5 ]
Wu, Shuang [4 ]
Wang, Zhibo [2 ]
Ren, Kui [2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Jilin, Peoples R China
[2] Zhejiang Univ, Sch Cyber Sci & Technol, Hangzhou, Peoples R China
[3] Jilin Univ, Publ Comp Educ & Res Ctr, Jilin, Peoples R China
[4] Black Sesame Technol, Solaris, Singapore
[5] Zhejiang Lab, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition has long been a fundamental and intriguing problem in artificial intelligence. The task is challenging due to the high dimensionality nature of an action, as well as the subtle motion details to be considered. Current state-of-the-art approaches typically learn from articulated motion sequences in the straightforward 3D Euclidean space. However, the vanilla Euclidean space is not efficient for modeling important motion characteristics such as the joint-wise angular acceleration, which reveals the driving force behind the motion. Moreover, current methods typically attend to each channel equally and lack theoretical constrains on extracting task-relevant features from the input. In this paper, we seek to tackle these challenges from three aspects: (1) We propose to incorporate an acceleration representation, explicitly modeling the higher-order variations in motion. (2) We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention, where different representations (i.e., streams) supplement each other towards a more precise action recognition while attention capitalizes on those important channels. (3) We explore feature-level supervision for maximizing the extraction of task-relevant information and formulate this into a mutual information loss. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA.
引用
收藏
页码:1658 / 1666
页数:9
相关论文
共 50 条
  • [21] Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network
    Li, Bing
    Kong, Longteng
    Zhang, Dongming
    Bao, Xiuguo
    Huang, Di
    Wang, Yunhong
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3744 - 3750
  • [22] Multi-stream slowFast graph convolutional networks for skeleton-based action recognition
    Sun, Ning
    Leng, Ling
    Liu, Jixin
    Han, Guang
    IMAGE AND VISION COMPUTING, 2021, 109
  • [23] Multi-Stream Fusion Network for Skeleton-Based Construction Worker Action Recognition
    Tian, Yuanyuan
    Liang, Yan
    Yang, Haibin
    Chen, Jiayu
    SENSORS, 2023, 23 (23)
  • [24] Skeleton-Based Action Recognition With Multi-Stream Adaptive Graph Convolutional Networks
    Shi, Lei
    Zhang, Yifan
    Cheng, Jian
    Lu, Hanqing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 9532 - 9545
  • [25] MULTI-STREAM SUM RATE MAXIMIZATION FOR MIMO AF RELAY NETWORKS
    Sun, Cong
    Jorswieck, Eduard
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 4434 - 4438
  • [26] Robust Speaker Recognition Based on Multi-Stream Features
    Wang, Ning
    Wang, Lei
    2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-CHINA (ICCE-CHINA), 2016,
  • [27] Multi-stream Convolutional Networks for Indoor Scene Recognition
    Anwer, Rao Muhammad
    Khan, Fahad Shahbaz
    Laaksonen, Jorma
    Zaki, Nazar
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2019, PT I, 2019, 11678 : 196 - 208
  • [28] SUBBAND HYBRID FEATURE FOR MULTI-STREAM SPEECH RECOGNITION
    Li, Feipeng
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [29] A multi-stream approach to audiovisual automatic speech recognition
    Hasegawa-Johnson, Mark
    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 328 - 331
  • [30] Multi-Stream End-to-End Speech Recognition
    Li, Ruizhi
    Wang, Xiaofei
    Mallidi, Sri Harish
    Watanabe, Shinji
    Hori, Takaaki
    Hermansky, Hynek
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655