Cross-Attention Fusion of Visual and Geometric Features for Large-Vocabulary Arabic Lipreading

被引：0

作者：

Daou, Samar ^{[1
]}

Ben-Hamadou, Achraf ^{[1
,2
]}

Rekik, Ahmed ^{[1
,3
]}

Kallel, Abdelaziz ^{[1
,2
]}

机构：

[1] Technopk Sfax, SMARTS Lab, Sfax 3021, Tunisia

[2] Technopole Sfax, Digital Res Ctr Sfax, Sfax 3021, Tunisia

[3] Gafsa Univ, ISSAT Inst Super Sci Appl & Technol, Sidi Ahmed Zarrouk Univ Campus, Gafsa 2112, Tunisia

来源：

TECHNOLOGIES | 2025年 / 13卷 / 01期

关键词：

lipreading; deep learning; LRW-AR; graph neural networks; Transformer; Arabic language;

D O I：

10.3390/technologies13010026

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Lipreading involves recognizing spoken words by analyzing the movements of the lips and surrounding area using visual data. It is an emerging research topic with many potential applications, such as human-machine interaction and enhancing audio-based speech recognition. Recent deep learning approaches integrate visual features from the mouth region and lip contours. However, simple methods such as concatenation may not effectively optimize the feature vector. In this article, we propose extracting optimal visual features using 3D convolution blocks followed by a ResNet-18, while employing a graph neural network to extract geometric features from tracked lip landmarks. To fuse these complementary features, we introduce a cross-attention mechanism that combines visual and geometric information to obtain an optimal representation of lip movements for lipreading tasks. To validate our approach for Arabic, we introduce the first large-scale Lipreading in the Wild for Arabic (LRW-AR) dataset, consisting of 20,000 videos across 100 word classes, spoken by 36 speakers. Experimental results on both the LRW-AR and LRW datasets demonstrate the effectiveness of our approach, achieving accuracies of 85.85% and 89.41%, respectively.

引用

页数：22

共 50 条

[41] A Cross-Attention and Multilevel Feature Fusion Network for Breast Lesion Segmentation in Ultrasound Images
Liu, Guoqi
Zhou, Yanan
Wang, Jiajia
Chen, Zongyu
Liu, Dong
Chang, Baofang
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
[42] TSMCF: Transformer-Based SAR and Multispectral Cross-Attention Fusion for Cloud Removal
Zhu, Hongming
Wang, Zeju
Han, Letong
Xu, Manxin
Li, Weiqi
Liu, Qin
Liu, Sicong
Du, Bowen
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6710 - 6720
[43] EFCANet: Exposure Fusion Cross-Attention Network for Low-Light Image Enhancement
Yang, Zhe
Liu, Fangjin
Li, Jinjiang
APPLIED SCIENCES-BASEL, 2023, 13 (01):
[44] LiDAR-Guided Cross-Attention Fusion for Hyperspectral Band Selection and Image Classification
Yang, Judy X.
Zhou, Jun
Wang, Jing
Tian, Hui
Liew, Alan Wee-Chung
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
[45] CASF-Net: Cross-attention and cross-scale fusion network for medical image segmentation
Zheng, Jianwei
Liu, Hao
Feng, Yuchao
Xu, Jinshan
Zhao, Liang
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 229
[46] Real-time stereo matching with enhanced geometric comprehension through cross-attention integration
Hashemi, Hosein
Baleghi, Yasser
Hassanzadeh, Mohamad Reza
NEUROCOMPUTING, 2025, 636
[47] Spatio-spectral Cross-Attention Transformer for Hyperspectral image and Multispectral image fusion
Qin, Xilei
Song, Huihui
Fan, Jiaqing
Zhang, Kaihua
REMOTE SENSING LETTERS, 2023, 14 (12) : 1303 - 1314
[48] Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion
Yu, Shaode
Meng, Jiajian
Fan, Wenqing
Chen, Ye
Zhu, Bing
Yu, Hang
Xie, Yaoqin
Sun, Qiuirui
ELECTRONICS, 2024, 13 (11)
[49] Spatial-Spectral Middle Cross-Attention Fusion Network for Hyperspectral Image Superresolution
Lang, Xiujuan
Lu, Tao
Zhang, Yanduo
Jiang, Junjun
Xiong, Zixiang
PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2024, 90 (11): : 675 - 686
[50] DACFusion: Dual Asymmetric Cross-Attention guided feature fusion for multispectral object detection
Qian, Jingchen
Qiao, Baiyou
Zhang, Yuekai
Liu, Tongyan
Wang, Shuo
Wu, Gang
Han, Donghong
NEUROCOMPUTING, 2025, 635

← 1 2 3 4 5 →