Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition

被引:19
|
作者
Du, Zhengyin [1 ]
Wu, Suowei [2 ]
Huang, Di [1 ]
Li, Weixin [3 ]
Wang, Yunhong [3 ]
机构
[1] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sino French Engineer Sch, Beijing 100191, Peoples R China
[3] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Convolution; Decoding; Feature extraction; Videos; Visualization; Task analysis; Dimensional emotion recognition; spatio-temporal fully convolutional network; temporal hourglass CNN; temporal intermediate supervision; EXPRESSION RECOGNITION;
D O I
10.1109/TAFFC.2019.2940224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-based dimensional emotion recognition aims to map human affect into the dimensional emotion space based on visual signals, which is a fundamental challenge in affective computing and human-computer interaction. In this paper, we present a novel encoder-decoder framework to tackle this problem. It adopts a fully convolutional design with the cascaded 2D convolution based spatial encoder and 1D convolution based temporal encoder-decoder for joint spatio-temporal modeling. In particular, to address the key issue of capturing discriminative long-term dynamic dependency, our temporal model, referred to as Temporal Hourglass Convolutional Neural Network (TH-CNN), extracts contextual relationship through integrating both low-level encoded and high-level decoded clues. Temporal Intermediate Supervision (TIS) is then introduced to enhance affective representations generated by TH-CNN under a multi-resolution strategy, which guides TH-CNN to learn macroscopic long-term trend and refined short-term fluctuations progressively. Furthermore, thanks to TH-CNN and TIS, knowledge learnt from the intermediate layers also makes it possible to offer customized solutions to different applications by adjusting the decoder depth. Extensive experiments are conducted on three benchmark databases (RECOLA, SEWA and OMG) and superior results are shown compared to state-of-the-art methods, which indicates the effectiveness of the proposed approach.
引用
收藏
页码:565 / 578
页数:14
相关论文
共 50 条
  • [31] Convolutional neural network based encoder-decoder architectures for semantic segmentation of plants
    Kolhar, Shrikrishna
    Jagtap, Jayant
    ECOLOGICAL INFORMATICS, 2021, 64
  • [32] Eyenet: Attention based Convolutional Encoder-Decoder Network for Eye Region Segmentation
    Kansal, Priya
    Nathan, Sabari
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3688 - 3693
  • [33] A novel spatio-temporal convolutional neural framework for multimodal emotion recognition
    Sharafi, Masoumeh
    Yazdchi, Mohammadreza
    Rasti, Reza
    Nasimi, Fahimeh
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 78
  • [34] SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network
    Liu, Zichuan
    Li, Yixing
    Ren, Fengbo
    Goh, Wang Ling
    Yu, Hao
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7194 - 7201
  • [35] Inversion and identification of vertical track irregularities considering the differential subgrade settlement based on fully convolutional encoder-decoder network
    Chen, Mei
    Zhu, Shengyang
    Zhai, Wanming
    Sun, Yu
    Zhang, Qinglai
    CONSTRUCTION AND BUILDING MATERIALS, 2023, 367
  • [36] Spatio-temporal graph Bert network for EEG emotion recognition
    Yan, Jingjie
    Du, Chengkun
    Li, Na
    Zhou, Xiaoyang
    Liu, Ying
    Wei, Jinsheng
    Yang, Yuan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 104
  • [37] MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module
    Zhang, Yi
    SENSORS, 2022, 22 (17)
  • [38] A Spatio-Temporal Convolutional Neural Network for Skeletal Action Recognition
    Hu, Lizhang
    Xu, Jinhua
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 377 - 385
  • [39] A fully-convolutional residual encoder-decoder neural network to localize breast cancer on histopathology images
    Farajzadeh, Nacer
    Sadeghzadeh, Nima
    Hashemzadeh, Mahdi
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 147
  • [40] A Spatio-temporal Fully Convolutional Recurrent Neural Network Based Surface Topography Prediction
    Shao Y.
    Tan J.
    Lu J.
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2021, 57 (20): : 292 - 304