Compressed Video Action Recognition With Dual-Stream and Dual-Modal Transformer

被引:4
|
作者
Mou, Yuting [1 ]
Jiang, Xinghao [1 ]
Xu, Ke [1 ]
Sun, Tanfeng [1 ]
Wang, Zepeng [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Natl Engn Lab Informat Content Anal Tech, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Compressed video; action recognition; NETWORK; EFFICIENCY;
D O I
10.1109/TCSVT.2023.3319140
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Compressed video action recognition offers the advantage of reducing decoding and inference time compared to the RGB domain. However, the compressed domain poses unique challenges with different types of frames (I-frames and P-frames). I-frames consistent with RGB are rich in frame information, but the redundant information may interfere with the recognition task. There are two modalities in P-frames, residual (R) and motion vector (MV). Although with less information, they can reflect the motion cue. To address these challenges and leverage the independent information from different frames and modalities, we propose a novel approach called Dual-Stream and Dual-Modal Transformer (DSDMT). Our approach consists of two streams: 1) The short-span P-frames stream contains temporal information. We propose the Dual-Modal Attention Module (DAM) to mine different modal variability in P-frames and complement the orthogonal feature vector. Besides, considering the sparsity of P-frames, we extract action features with Frame-level Patch Embedding (FPE) to avoid redundant computation. 2) The long-span I-frames stream extracts the global context feature of the entire video, including content and scene information. By fusing the global video context and local key-frame features, our model represents the action feature in terms of fine-grained and coarse-grained. We evaluated our proposed DSDMT on three public benchmarks with different scales: HMDB-51, UCF-101, and Kinetics-400. Ours achieve better performance with fewer Flops and lower latency. Our analysis shows that the independence and complements of the I-frames and P-frames extracted from the compressed video stream play a crucial role in action recognition.
引用
收藏
页码:3299 / 3312
页数:14
相关论文
共 50 条
  • [21] A Dual-Stream Transformer With Diff-Attention for Multispectral and Panchromatic Classification
    Xu, Lin
    Zhu, Hao
    Jiao, Licheng
    Zhao, Wenhao
    Li, Xiaotong
    Hou, Biao
    Ren, Zhongle
    Ma, Wenping
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 14
  • [22] A DUAL-STREAM NEUROANATOMY OF SINGING
    Loui, Psyche
    MUSIC PERCEPTION, 2015, 32 (03): : 232 - 241
  • [23] Dual-stream VO: Visual Odometry Based on LSTM Dual-Stream Convolutional Neural Network
    Luo, Yuan
    Zeng, YongChao
    Lv, RunZhe
    Wang, WenHao
    ENGINEERING LETTERS, 2022, 30 (03) : 926 - 934
  • [24] Video salient object detection using dual-stream spatiotemporal attention
    Xu, Chenchu
    Gao, Zhifan
    Zhang, Heye
    Li, Shuo
    de Albuquerque, Victor Hugo C.
    APPLIED SOFT COMPUTING, 2021, 108
  • [25] The motion vector reuse algorithm to improve dual-stream video encoder
    Zhou, Hong
    Zhou, Jingli
    Xia, Xiaojian
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 1284 - 1287
  • [26] Bayesian Cellular Automata Fusion Model Based on Dual-Stream Strategy for Video Anomaly Action Detection
    Zhao, Zhongtang
    Li, Ruixian
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2021, 31 (04) : 688 - 698
  • [28] Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval
    Li, Pandeng
    Xie, Hongtao
    Ge, Jiannan
    Zhang, Lei
    Min, Shaobo
    Zhang, Yongdong
    COMPUTER VISION - ECCV 2022, PT XIV, 2022, 13674 : 181 - 197
  • [29] Dual-stream spatio-temporal decoupling network for video deblurring
    Ning, Taigong
    Li, Weihong
    Li, Zhenghao
    Zhang, Yanfang
    APPLIED SOFT COMPUTING, 2022, 116
  • [30] Dual-stream cross-modal fusion alignment network for survival analysis
    Song, Jinmiao
    Hao, Yatong
    Zhao, Shuang
    Zhang, Peng
    Feng, Qilin
    Dai, Qiguo
    Duan, Xiaodong
    BRIEFINGS IN BIOINFORMATICS, 2025, 26 (02)