Attentional bias for hands: Cascade dual-decoder transformer for sign language production

被引:0
|
作者
Ma, Xiaohan [1 ]
Jin, Rize [2 ]
Wang, Jianming [3 ]
Chung, Tae-Sun [1 ]
机构
[1] Ajou Univ, Dept Artificial Intelligence, Suwon, South Korea
[2] Tiangong Univ, Sch Software, Tianjin, Peoples R China
[3] Tiangong Univ, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin, Peoples R China
关键词
computer vision; natural language processing; pose estimation; sign language production;
D O I
10.1049/cvi2.12273
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sign Language Production (SLP) refers to the task of translating textural forms of spoken language into corresponding sign language expressions. Sign languages convey meaning by means of multiple asynchronous articulators, including manual and non-manual information channels. Recent deep learning-based SLP models directly generate the full-articulatory sign sequence from the text input in an end-to-end manner. However, these models largely down weight the importance of subtle differences in the manual articulation due to the effect of regression to the mean. To explore these neglected aspects, an efficient cascade dual-decoder Transformer (CasDual-Transformer) for SLP is proposed to learn, successively, two mappings SLPhand: Text -> Hand pose and SLPsign: Text -> Sign pose, utilising an attention-based alignment module that fuses the hand and sign features from previous time steps to predict more expressive sign pose at the current time step. In addition, to provide more efficacious guidance, a novel spatio-temporal loss to penalise shape dissimilarity and temporal distortions of produced sequences is introduced. Experimental studies are performed on two benchmark sign language datasets from distinct cultures to verify the performance of the proposed model. Both quantitative and qualitative results show that the authors' model demonstrates competitive performance compared to state-of-the-art models, and in some cases, achieves considerable improvements over them. An efficient cascade dual decoder Transformer model is presented, which heuristically optimises mappings among text, hand pose, and full-articulatory pose for sign language production (SLP). In addition, a novel spatio-temporal loss is introduced to provide more efficacious guidance for SLP models. Both quantitative and qualitative results show that the proposed SLP model with spatio-temporal loss function achieves state-of-the-art results on both German and Korean SLP tasks. image
引用
收藏
页码:696 / 708
页数:13
相关论文
共 10 条
  • [1] A Cascade Dual-Decoder Model for Joint Entity and Relation Extraction
    Cheng, Jian
    Zhang, Tian
    Zhang, Shuang
    Ren, Huimin
    Yu, Guo
    Zhang, Xiliang
    Gao, Shangce
    Ma, Lianbo
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1 - 13
  • [2] Dual-decoder transformer network for answer grounding in visual question answering
    Zhu, Liangjun
    Peng, Li
    Zhou, Weinan
    Yang, Jielong
    PATTERN RECOGNITION LETTERS, 2023, 171 : 53 - 60
  • [3] BDFormer: Boundary-aware dual-decoder transformer for skin lesion segmentation
    Ji, Zexuan
    Ye, Yuxuan
    Ma, Xiao
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2025, 162
  • [4] Shadow detection using a cross-attentional dual-decoder network with self-supervised image reconstruction features
    Fernandez-Beltran, Ruben
    Guzman-Ponce, Angelica
    Fernandez, Rafael
    Kang, Jian
    Garcia-Mateos, Gines
    IMAGE AND VISION COMPUTING, 2024, 143
  • [5] AD2T: Multivariate Time-Series Anomaly Detection With Association Discrepancy Dual-Decoder Transformer
    Li, Zezhong
    Guo, Wei
    An, Jianpeng
    Wang, Qi
    Mei, Yingchun
    Juan, Rongshun
    Wang, Tianshu
    Li, Yang
    Gao, Zhongke
    IEEE SENSORS JOURNAL, 2025, 25 (07) : 11710 - 11721
  • [6] A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production
    Cui, Zhenchao
    Chen, Ziang
    Li, Zhaoxin
    Wang, Zhaoqi
    SENSORS, 2022, 22 (24)
  • [7] Spatial-Temporal Graph Transformer With Sign Mesh Regression for Skinned-Based Sign Language Production
    Cui, Zhenchao
    Chen, Ziang
    Li, Zhaoxin
    Wang, Zhaoqi
    IEEE ACCESS, 2022, 10 : 127530 - 127539
  • [8] Spatial-Temporal Graph Transformer with Sign Mesh Regression for Skinned-Based Sign Language Production
    Cui, Zhenchao
    Chen, Ziang
    Li, Zhaoxin
    Wang, Zhaoqi
    IEEE Access, 2022, 10 : 127530 - 127539
  • [9] FMD-UNet: fine-grained feature squeeze and multiscale cascade dilated semantic aggregation dual-decoder UNet for COVID-19 lung infection segmentation from CT images
    Wang, Wenfeng
    Mao, Qi
    Tian, Yi
    Zhang, Yan
    Xiang, Zhenwu
    Ren, Lijia
    BIOMEDICAL PHYSICS & ENGINEERING EXPRESS, 2024, 10 (05):
  • [10] DualSign: Semi-Supervised Sign Language Production with Balanced Multi-Modal Multi-Task Dual Transformation
    Huang, Wencan
    Zhao, Zhou
    He, Jinzheng
    Zhang, Mingmin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5486 - 5495