Cross-Attention Fusion of Visual and Geometric Features for Large-Vocabulary Arabic Lipreading

被引:0
|
作者
Daou, Samar [1 ]
Ben-Hamadou, Achraf [1 ,2 ]
Rekik, Ahmed [1 ,3 ]
Kallel, Abdelaziz [1 ,2 ]
机构
[1] Technopk Sfax, SMARTS Lab, Sfax 3021, Tunisia
[2] Technopole Sfax, Digital Res Ctr Sfax, Sfax 3021, Tunisia
[3] Gafsa Univ, ISSAT Inst Super Sci Appl & Technol, Sidi Ahmed Zarrouk Univ Campus, Gafsa 2112, Tunisia
关键词
lipreading; deep learning; LRW-AR; graph neural networks; Transformer; Arabic language;
D O I
10.3390/technologies13010026
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Lipreading involves recognizing spoken words by analyzing the movements of the lips and surrounding area using visual data. It is an emerging research topic with many potential applications, such as human-machine interaction and enhancing audio-based speech recognition. Recent deep learning approaches integrate visual features from the mouth region and lip contours. However, simple methods such as concatenation may not effectively optimize the feature vector. In this article, we propose extracting optimal visual features using 3D convolution blocks followed by a ResNet-18, while employing a graph neural network to extract geometric features from tracked lip landmarks. To fuse these complementary features, we introduce a cross-attention mechanism that combines visual and geometric information to obtain an optimal representation of lip movements for lipreading tasks. To validate our approach for Arabic, we introduce the first large-scale Lipreading in the Wild for Arabic (LRW-AR) dataset, consisting of 20,000 videos across 100 word classes, spoken by 36 speakers. Experimental results on both the LRW-AR and LRW datasets demonstrate the effectiveness of our approach, achieving accuracies of 85.85% and 89.41%, respectively.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] A Cross-Attention and Multilevel Feature Fusion Network for Breast Lesion Segmentation in Ultrasound Images
    Liu, Guoqi
    Zhou, Yanan
    Wang, Jiajia
    Chen, Zongyu
    Liu, Dong
    Chang, Baofang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [42] TSMCF: Transformer-Based SAR and Multispectral Cross-Attention Fusion for Cloud Removal
    Zhu, Hongming
    Wang, Zeju
    Han, Letong
    Xu, Manxin
    Li, Weiqi
    Liu, Qin
    Liu, Sicong
    Du, Bowen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6710 - 6720
  • [43] EFCANet: Exposure Fusion Cross-Attention Network for Low-Light Image Enhancement
    Yang, Zhe
    Liu, Fangjin
    Li, Jinjiang
    APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [44] LiDAR-Guided Cross-Attention Fusion for Hyperspectral Band Selection and Image Classification
    Yang, Judy X.
    Zhou, Jun
    Wang, Jing
    Tian, Hui
    Liew, Alan Wee-Chung
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [45] CASF-Net: Cross-attention and cross-scale fusion network for medical image segmentation
    Zheng, Jianwei
    Liu, Hao
    Feng, Yuchao
    Xu, Jinshan
    Zhao, Liang
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 229
  • [46] Real-time stereo matching with enhanced geometric comprehension through cross-attention integration
    Hashemi, Hosein
    Baleghi, Yasser
    Hassanzadeh, Mohamad Reza
    NEUROCOMPUTING, 2025, 636
  • [47] Spatio-spectral Cross-Attention Transformer for Hyperspectral image and Multispectral image fusion
    Qin, Xilei
    Song, Huihui
    Fan, Jiaqing
    Zhang, Kaihua
    REMOTE SENSING LETTERS, 2023, 14 (12) : 1303 - 1314
  • [48] Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion
    Yu, Shaode
    Meng, Jiajian
    Fan, Wenqing
    Chen, Ye
    Zhu, Bing
    Yu, Hang
    Xie, Yaoqin
    Sun, Qiuirui
    ELECTRONICS, 2024, 13 (11)
  • [49] Spatial-Spectral Middle Cross-Attention Fusion Network for Hyperspectral Image Superresolution
    Lang, Xiujuan
    Lu, Tao
    Zhang, Yanduo
    Jiang, Junjun
    Xiong, Zixiang
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2024, 90 (11): : 675 - 686
  • [50] DACFusion: Dual Asymmetric Cross-Attention guided feature fusion for multispectral object detection
    Qian, Jingchen
    Qiao, Baiyou
    Zhang, Yuekai
    Liu, Tongyan
    Wang, Shuo
    Wu, Gang
    Han, Donghong
    NEUROCOMPUTING, 2025, 635