Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition

被引:19
|
作者
Du, Zhengyin [1 ]
Wu, Suowei [2 ]
Huang, Di [1 ]
Li, Weixin [3 ]
Wang, Yunhong [3 ]
机构
[1] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sino French Engineer Sch, Beijing 100191, Peoples R China
[3] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Convolution; Decoding; Feature extraction; Videos; Visualization; Task analysis; Dimensional emotion recognition; spatio-temporal fully convolutional network; temporal hourglass CNN; temporal intermediate supervision; EXPRESSION RECOGNITION;
D O I
10.1109/TAFFC.2019.2940224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-based dimensional emotion recognition aims to map human affect into the dimensional emotion space based on visual signals, which is a fundamental challenge in affective computing and human-computer interaction. In this paper, we present a novel encoder-decoder framework to tackle this problem. It adopts a fully convolutional design with the cascaded 2D convolution based spatial encoder and 1D convolution based temporal encoder-decoder for joint spatio-temporal modeling. In particular, to address the key issue of capturing discriminative long-term dynamic dependency, our temporal model, referred to as Temporal Hourglass Convolutional Neural Network (TH-CNN), extracts contextual relationship through integrating both low-level encoded and high-level decoded clues. Temporal Intermediate Supervision (TIS) is then introduced to enhance affective representations generated by TH-CNN under a multi-resolution strategy, which guides TH-CNN to learn macroscopic long-term trend and refined short-term fluctuations progressively. Furthermore, thanks to TH-CNN and TIS, knowledge learnt from the intermediate layers also makes it possible to offer customized solutions to different applications by adjusting the decoder depth. Extensive experiments are conducted on three benchmark databases (RECOLA, SEWA and OMG) and superior results are shown compared to state-of-the-art methods, which indicates the effectiveness of the proposed approach.
引用
收藏
页码:565 / 578
页数:14
相关论文
共 50 条
  • [41] Semantic Segmentation of Remote Sensing Image Based on Encoder-Decoder Convolutional Neural Network
    Zhang Zhehan
    Fang Wei
    Du Lili
    Qiao Yanli
    Zhang Dongying
    Ding Guoshen
    ACTA OPTICA SINICA, 2020, 40 (03)
  • [42] ACSF-ED: Adaptive Cross-Scale Fusion Encoder-Decoder for Spatio-Temporal Action Detection
    Wang, Wenju
    Gu, Zehua
    Tang, Bang
    Wang, Sen
    Hao, Jianfei
    CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (02): : 2389 - 2414
  • [43] MULTISCALE SPATIO-TEMPORAL NETWORK FOR AERIAL VIDEO EVENT RECOGNITION
    Yang, Feng
    Zhang, Jian
    Zhao, Yue
    Qin, Anyong
    Gao, Chenqiang
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 7835 - 7838
  • [44] TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal
    Zhu, Hongyuan
    Vial, Romain
    Lu, Shijian
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5814 - 5822
  • [45] GEOSPATIAL-TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR VIDEO-BASED PRECIPITATION INTENSITY RECOGNITION
    Lin, Chih-Wei
    Yang, Suhui
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1119 - 1123
  • [46] Very deep fully convolutional encoder-decoder network based on wavelet transform for art image fusion in cloud computing environment
    Chen, Tong
    Yang, Juan
    EVOLVING SYSTEMS, 2023, 14 (02) : 281 - 293
  • [47] SSTD: A Novel Spatio-Temporal Demographic Network for EEG-Based Emotion Recognition
    Li, Rui
    Ren, Chao
    Li, Chen
    Zhao, Nan
    Lu, Dawei
    Zhang, Xiaowei
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (01) : 376 - 387
  • [48] Skeleton Action Recognition Based on Spatio-temporal Feature Enhanced Graph Convolutional Network
    Cao, Yi
    Wu, Weiguan
    Li, Ping
    Xia, Yu
    Gao, Qingyuan
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (08) : 3022 - 3031
  • [49] Spatio-Temporal Dynamic Attention Graph Convolutional Network Based on Skeleton Gesture Recognition
    Han, Xiaowei
    Cui, Ying
    Chen, Xingyu
    Lu, Yunjing
    Hu, Wen
    ELECTRONICS, 2024, 13 (18)
  • [50] LF-SegNet: A Fully Convolutional Encoder-Decoder Network for Segmenting Lung Fields from Chest Radiographs
    Mittal, Ajay
    Hooda, Rahul
    Sofat, Sanjeev
    WIRELESS PERSONAL COMMUNICATIONS, 2018, 101 (01) : 511 - 529