Deep Feature Flow for Video Recognition

被引:416
|
作者
Zhu, Xizhou [1 ,2 ]
Xiong, Yuwen [2 ]
Dai, Jifeng [2 ]
Yuan, Lu [2 ]
Wei, Yichen [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR.2017.441
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable. We present deep feature flow, a fast and accurate framework for video recognition. It runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field. It achieves significant speedup as flow computation is relatively fast. The end-to-end training of the whole architecture significantly boosts the recognition accuracy. Deep feature flow is flexible and general. It is validated on two video datasets on object detection and semantic segmentation. It significantly advances the practice of video recognition tasks. Code would be released.
引用
收藏
页码:4141 / 4150
页数:10
相关论文
共 50 条
  • [1] Deep Local Video Feature for Action Recognition
    Lan, Zhenzhong
    Zhu, Yi
    Hauptmann, Alexander G.
    Newsam, Shawn
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1219 - 1225
  • [2] Video Emotion Recognition with Transferred Deep Feature Encodings
    Xu, Baohan
    Fu, Yanwei
    Jiang, Yu-Gang
    Li, Boyang
    Sigal, Leonid
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 15 - 22
  • [3] Deep Optical Flow Feature Fusion Based on 3D Convolutional Networks for Video Action Recognition
    Lu, Tongwei
    Ai, Shihui
    Jiang, Yongyuan
    Xiong, Yudian
    Min, Feng
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 1077 - 1080
  • [4] Sparse Feature Auto-combination Deep Network for Video Action Recognition
    Wang, Qicong
    Gong, Dingxi
    Li, Maozhen
    Zhao, Chong
    Lei, Yunqi
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 712 - 716
  • [5] DEEP KEY CLIPS-VIDEO FEATURE FUSION FRAMEWORK FOR ACTION RECOGNITION
    Li, Chao
    Ming, Yue
    Shen, Yuan
    Yu, Hui
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 156 - 161
  • [6] Video-Based Emotion Recognition using Face Frontalization and Deep Spatiotemporal Feature
    Wang, Jinwei
    Zhao, Ziping
    Liang, Jinglian
    Li, Chao
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [7] Video-Audio Emotion Recognition Based on Feature Fusion Deep Learning Method
    Song, Yanan
    Cai, Yuanyang
    Tan, Lizhe
    2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 611 - 616
  • [8] Temporal sparse feature auto-combination deep network for video action recognition
    Wang, Qicong
    Gong, Dingxi
    Qi, Man
    Shen, Yehu
    Lei, Yunqi
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [9] A Deep Feature based Multi-kernel Learning Approach for Video Emotion Recognition
    Li, Wei
    Abtahi, Farnaz
    Zhu, Zhigang
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 482 - 489
  • [10] Dynamic FERNet: Deep learning with optimal feature selection for face expression recognition in video
    Jagadeesh, M.
    Baranidharan, B.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (28):