Dual-STI: Dual-path spatial-temporal interaction learning for dynamic facial expression recognition

被引:1
|
作者
Li, Min [1 ]
Zhang, Xiaoqin [1 ]
Fan, Chenxiang [1 ]
Liao, Tangfei [1 ]
Xiao, Guobao [2 ]
机构
[1] Wenzhou Univ, Coll Comp & Artificial Intelligence, Wenzhou 325035, Peoples R China
[2] Tongji Univ, Sch Elect & Informat Engn, Shanghai 201804, Peoples R China
基金
中国国家自然科学基金;
关键词
Dynamic facial expression recognition; Spatial-temporal feature; Spatial-temporal interaction; Comparative learning; NETWORK; AWARE;
D O I
10.1016/j.ins.2024.120953
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning facial evaluation is crucial for dynamic facial expression recognition. Current recognition methods typically extract temporal features after spatial features to achieve low computation complexity. However, these methods struggle to model complex facial evaluations due to a lack of interaction between spatial and temporal features. This paper proposes a novel Dualpath Spatial -Temporal Interaction (Dual-STI) framework that concurrently extracts spatial and temporal features through two efficient paths. Specifically, Dual-STI comprises a spatial path and a temporal path. The spatial path contains several spatial transformers to capture robust facial features from each sampled frame, while the temporal path includes several temporal transformers to learn rich contextual facial features from the sequence of frames. To facilitate spatial -temporal interaction, Dual-STI features a distinct dual-path interaction module that adaptively fuses spatial and temporal features by combining spatial and temporal attention mechanisms. Additionally, comparative learning is introduced into the loss function to enhance this interaction. To evaluate the proposed method, extensive experiments are conducted on three popular benchmarks, namely DFEW, AFEW, and FERV39k. The experimental results demonstrate that the proposed Dual-STI achieves state -of -the -art performance with low computational complexity across all datasets. Notably, Dual-STI shows significant improvements in the "disgust" and "fear" categories, with precision increases of 3 .45% and 2 .1% on the DFEW dataset, respectively.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] STPDNET: SPATIAL-TEMPORAL CONVOLUTIONAL PRIMAL DUAL NETWORK FOR DYNAMIC PET IMAGE RECONSTRUCTION
    Hu, Rui
    Cui, Jianan
    Yu, Chengjin
    Chen, Yunmei
    Liu, Huafeng
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [32] Spatial-temporal interaction learning based two-stream network for action recognition
    Liu, Tianyu
    Ma, Yujun
    Yang, Wenhan
    Ji, Wanting
    Wang, Ruili
    Jiang, Ping
    INFORMATION SCIENCES, 2022, 606 : 864 - 876
  • [33] Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model
    Li, Jinghua
    Huai, Huarui
    Gao, Junbin
    Kong, Dehui
    Wang, Lichun
    JOURNAL ON MULTIMODAL USER INTERFACES, 2019, 13 (04) : 363 - 371
  • [34] Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model
    Jinghua Li
    Huarui Huai
    Junbin Gao
    Dehui Kong
    Lichun Wang
    Journal on Multimodal User Interfaces, 2019, 13 : 363 - 371
  • [35] Learning Expressionlets on Spatio-Temporal Manifold for Dynamic Facial Expression Recognition
    Liu, Mengyi
    Shan, Shiguang
    Wang, Ruiping
    Chen, Xilin
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1749 - 1756
  • [36] A spatial-temporal framework based on histogram of gradients and optical flow for facial expression recognition in video sequences
    Fan, Xijian
    Tjahjadi, Tardi
    PATTERN RECOGNITION, 2015, 48 (11) : 3407 - 3416
  • [37] Dynamical Facial Expression Recognition by Integrating 3D Spatial-Temporal Network and Static Network
    Liu, Wenlong
    Han, Shoudong
    Chen, Yang
    PROCEEDINGS OF 2017 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION SYSTEMS (ICCIS 2017), 2015, : 304 - 308
  • [38] DuroNet: A Dual-robust Enhanced Spatial-temporal Learning Network for Urban Crime Prediction
    Hu, Kaixi
    Li, Lin
    Liu, Jianquan
    Sun, Daniel
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2021, 21 (01)
  • [39] Dual-Branch Residual Disentangled Adversarial Learning Network for Facial Expression Recognition
    Chen, Puhua
    Wang, Zhe
    Mao, Shasha
    Hui, Xinyue
    Ning, Huyan
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1840 - 1844
  • [40] Fusing HOG and convolutional neural network spatial-temporal features for video-based facial expression recognition
    Pan, Xianzhang
    IET IMAGE PROCESSING, 2020, 14 (01) : 176 - 182