Robotic Facial Emotion Transfer Network Based on Transformer Framework and B-spline Smoothing Constraint

被引：0

作者：

Huang Z. ^{[1
,2
]}

Ren F. ^{[3
]}

Hu M. ^{[2
]}

Liu J. ^{[1
]}

机构：

[1] School of Electronic Engineering and Intelligent Manufacturing, Anqing Normal University, Anqing

[2] School of Computer Science and Information, Hefei University of Technology, Hefei

[3] Faculty of Engineering, University of Tokushima, Tokushima

来源：

Jiqiren/Robot | 2023年 / 45卷 / 04期

关键词：

cubic B-spline smoothing constraint; facial emotion transfer; humanoid robot; inter-domain cooperative attention; intra-domain deformation attention;

D O I：

10.13973/j.cnki.robot.220351

中图分类号：

学科分类号：

摘要：

To improve the spatial-temporal consistency of facial emotion transfer and reduce the influence of mechanical motion constraints for humanoid robot, a robotic facial emotion transformer (RFEFormer) network based on Transformer framework and B-spline smoothing constraint is proposed. The RFEFormer network consists of facial deformation encode subnet and actuation sequence generation subnet. In facial deformation encode subnet, an intra-frame spatial attention module, which is constructed based on dual mechanisms of intra-domain deformation attention and inter-domain cooperative attention, is embedded into Transformer encoder to represent the intra-frame spatial information of different levels and granularities. In actuation sequence generation subnet, a Transformer decoder, which accomplishes cross attention of facial spatio-temporal sequence and history motor actuation sequence, is addressed for multi-step prediction of future motor drive sequence. Moreover, a cubic B-spline smoothing constraint is introduced to realize the warping of prediction sequence. The experimental results show that the motor actuation deviation, the facial deformation fidelity and motor motion smoothness of the RFEFormer network is 3.21%, 89.48% and 90.63%, respectively. Furthermore, the frame rate of the real-time facial emotion transfer is greater than 25 frames per second. Compared with the related methods, the proposed RFEFormer network not only satisfies the real-time performance, but also improves the time sequence-based indexes such as fidelity and smoothness, which are more sensitive and concerned by human senses. © 2023 Chinese Academy of Sciences. All rights reserved.

引用

页码：395 / 408

页数：13

共 26 条

[11] Horii T, Nagai Y, Asada M., Imitation of human expressions based on emotion estimation by mental simulation, Journal of Behavioral Robotics, 7, 1, pp. 40-54, (2016)
[12] Churamani N, Barros P, Strahl E, Et al., Learning empathy-driven emotion expressions using affective modulations, International Joint Conference on Neural Networks, pp. 1400-1407, (2018)
[13] Huang Z, Ren F J, Hu M, Et al., Facial expression imitation method for humanoid robot based on smooth-constraint reversed mechanical model (SRMM), IEEE Transactions on Human-Machine Systems, 50, 6, pp. 538-549, (2020)
[14] Chen B Y, Hu Y H, Li L F, Et al., Smile like you mean it: Driving animatronic robotic face with learned models, IEEE International Conference on Robotics and Automation, pp. 2739-2746, (2021)
[15] Zhang J Y, Chen K Y, Zheng J M., Facial expression retargeting from human to avatar made easy, IEEE Transactions on Visualization and Computer Graphics, 28, 2, pp. 1274-1287, (2022)
[16] Tuyen N T V, Elibol A, Chong N Y., Learning bodily expression of emotion for social robots through human interaction, IEEE Transactions on Cognitive and Developmental Systems, 13, 1, pp. 16-30, (2021)
[17] Sivakumar A, Shaw K, Pathak D., Robotic telekinesis: Learning a robotic hand imitator by watching humans on Youtube [DB/OL]
[18] Ondras J, Celiktutan O, Bremner P, Et al., Audio-driven robot upper-body motion synthesis, IEEE Transactions on Cybernetics, 51, 11, pp. 5445-5454, (2021)
[19] Mazzia V, Angarano S, Salvetti F, Et al., Action transformer: A self-attention model for short-time pose-based human action recognition, Pattern Recognition, 124, 4, (2022)
[20] Ma F Y, Sun B, Li S T., Facial expression recognition with visual transformers and attentional selective fusion[J], IEEE Transactions on Affective Computing, (2021)

← 1 2 3 →