Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition

被引:19
|
作者
Du, Zhengyin [1 ]
Wu, Suowei [2 ]
Huang, Di [1 ]
Li, Weixin [3 ]
Wang, Yunhong [3 ]
机构
[1] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sino French Engineer Sch, Beijing 100191, Peoples R China
[3] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Convolution; Decoding; Feature extraction; Videos; Visualization; Task analysis; Dimensional emotion recognition; spatio-temporal fully convolutional network; temporal hourglass CNN; temporal intermediate supervision; EXPRESSION RECOGNITION;
D O I
10.1109/TAFFC.2019.2940224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-based dimensional emotion recognition aims to map human affect into the dimensional emotion space based on visual signals, which is a fundamental challenge in affective computing and human-computer interaction. In this paper, we present a novel encoder-decoder framework to tackle this problem. It adopts a fully convolutional design with the cascaded 2D convolution based spatial encoder and 1D convolution based temporal encoder-decoder for joint spatio-temporal modeling. In particular, to address the key issue of capturing discriminative long-term dynamic dependency, our temporal model, referred to as Temporal Hourglass Convolutional Neural Network (TH-CNN), extracts contextual relationship through integrating both low-level encoded and high-level decoded clues. Temporal Intermediate Supervision (TIS) is then introduced to enhance affective representations generated by TH-CNN under a multi-resolution strategy, which guides TH-CNN to learn macroscopic long-term trend and refined short-term fluctuations progressively. Furthermore, thanks to TH-CNN and TIS, knowledge learnt from the intermediate layers also makes it possible to offer customized solutions to different applications by adjusting the decoder depth. Extensive experiments are conducted on three benchmark databases (RECOLA, SEWA and OMG) and superior results are shown compared to state-of-the-art methods, which indicates the effectiveness of the proposed approach.
引用
收藏
页码:565 / 578
页数:14
相关论文
共 50 条
  • [21] The exploration of a Temporal Convolutional Network combined with Encoder-Decoder framework for runoff forecasting
    Lin, Kangling
    Sheng, Sheng
    Zhou, Yanlai
    Liu, Feng
    Li, Zhiyu
    Chen, Hua
    Xu, Chong-Yu
    Chen, Jie
    Guo, Shenglian
    HYDROLOGY RESEARCH, 2020, 51 (05): : 1136 - 1149
  • [22] Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder-decoder residual multi-graph convolutional network
    Ke, Jintao
    Qin, Xiaoran
    Yang, Hai
    Zheng, Zhengfei
    Zhu, Zheng
    Ye, Jieping
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 122
  • [23] Spatio-Temporal PM2.5 Forecasting in Thailand Using Encoder-Decoder Networks
    Sirisumpun, Natch
    Wongwailikhit, Kritchart
    Painmanakul, Pisut
    Vateekul, Peerapon
    IEEE ACCESS, 2023, 11 : 69601 - 69613
  • [24] Deep Learning Based Video Spatio-Temporal Modeling for Emotion Recognition
    Fonnegra, Ruben D.
    Diaz, Gloria M.
    HUMAN-COMPUTER INTERACTION: THEORIES, METHODS, AND HUMAN ISSUES, HCI INTERNATIONAL 2018, PT I, 2018, 10901 : 397 - 408
  • [25] Detection of black box signal based on encoder-decoder fully convolutional networks
    Ji, Huazhong
    Zhou, Jie
    Pan, Xiang
    GLOBAL OCEANS 2020: SINGAPORE - U.S. GULF COAST, 2020,
  • [26] Adaptive Spatio-Temporal Convolutional Network for Video Deblurring
    Duan, Fengzhi
    Yao, Hongxun
    IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 777 - 788
  • [27] Predicting COVID-19 Lung Infiltrate Progression on Chest Radiographs Using Spatio-temporal LSTM based Encoder-Decoder Network
    Konwer, Aishik
    Bae, Joseph
    Singh, Gagandeep
    Gattu, Rishabh
    Ali, Syed
    Green, Jeremy
    Phatak, Tej
    Gupta, Amit
    Chen, Chao
    Saltz, Joel
    Prasanna, Prateek
    MEDICAL IMAGING WITH DEEP LEARNING, VOL 143, 2021, 143 : 384 - 398
  • [28] Vision-Based Autonomous Crack Detection of Concrete Structures Using a Fully Convolutional Encoder-Decoder Network
    Islam, M. M. Manjurul
    Kim, Jong-Myon
    SENSORS, 2019, 19 (19)
  • [29] Recognition of complex power lines based on novel encoder-decoder network
    Li Y.
    Li H.
    Zhang K.
    Wang B.
    Guan S.
    Chen Y.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (06): : 1133 - 1141
  • [30] Encoder-Decoder Convolutional Neural Network based Iris-Sclera Segmentation
    Sahin, Gurkan
    Susuz, Orkun
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,