Cross-Attention Fusion of Visual and Geometric Features for Large-Vocabulary Arabic Lipreading

被引：0

作者：

Daou, Samar ^{[1
]}

Ben-Hamadou, Achraf ^{[1
,2
]}

Rekik, Ahmed ^{[1
,3
]}

Kallel, Abdelaziz ^{[1
,2
]}

机构：

[1] Technopk Sfax, SMARTS Lab, Sfax 3021, Tunisia

[2] Technopole Sfax, Digital Res Ctr Sfax, Sfax 3021, Tunisia

[3] Gafsa Univ, ISSAT Inst Super Sci Appl & Technol, Sidi Ahmed Zarrouk Univ Campus, Gafsa 2112, Tunisia

来源：

TECHNOLOGIES | 2025年 / 13卷 / 01期

关键词：

lipreading; deep learning; LRW-AR; graph neural networks; Transformer; Arabic language;

D O I：

10.3390/technologies13010026

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Lipreading involves recognizing spoken words by analyzing the movements of the lips and surrounding area using visual data. It is an emerging research topic with many potential applications, such as human-machine interaction and enhancing audio-based speech recognition. Recent deep learning approaches integrate visual features from the mouth region and lip contours. However, simple methods such as concatenation may not effectively optimize the feature vector. In this article, we propose extracting optimal visual features using 3D convolution blocks followed by a ResNet-18, while employing a graph neural network to extract geometric features from tracked lip landmarks. To fuse these complementary features, we introduce a cross-attention mechanism that combines visual and geometric information to obtain an optimal representation of lip movements for lipreading tasks. To validate our approach for Arabic, we introduce the first large-scale Lipreading in the Wild for Arabic (LRW-AR) dataset, consisting of 20,000 videos across 100 word classes, spoken by 36 speakers. Experimental results on both the LRW-AR and LRW datasets demonstrate the effectiveness of our approach, achieving accuracies of 85.85% and 89.41%, respectively.

引用

页数：22

共 50 条

[31] DyFusion: Cross-Attention 3D Object Detection with Dynamic Fusion
Bi, Jiangfeng
Wei, Haiyue
Zhang, Guoxin
Yang, Kuihe
Song, Ziying
IEEE LATIN AMERICA TRANSACTIONS, 2024, 22 (02) : 106 - 112
[32] Background-Aware Cross-Attention Multiscale Fusion for Multispectral Object Detection
Guo, Runze
Guo, Xiaojun
Sun, Xiaoyong
Zhou, Peida
Sun, Bei
Su, Shaojing
REMOTE SENSING, 2024, 16 (21)
[33] ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection
Shen, Jifeng
Chen, Yifei
Liu, Yue
Zuo, Xin
Fan, Heng
Yang, Wankou
PATTERN RECOGNITION, 2024, 145
[34] MGCAF: A Novel Multigraph Cross-Attention Fusion Method for Traffic Speed Prediction
Ma, Tian
Wei, Xiaobao
Liu, Shuai
Ren, Yilong
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (21)
[35] Learning Cross-Attention Discriminators via Alternating TimeSpace Transformers for Visual Tracking
Wang, Wuwei
Zhang, Ke
Su, Yu
Wang, Jingyu
Wang, Qi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15156 - 15169
[36] VISUAL QUESTION ANSWERING IN REMOTE SENSING WITH CROSS-ATTENTION AND MULTIMODAL INFORMATION BOTTLENECK
Songara, Jayesh
Pande, Shivam
Choudhury, Shabnam
Banerjee, Biplab
Velmurugan, Rajbabu
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6278 - 6281
[37] Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins Summer 2000 Workshop
Neti, C
Potamianos, G
Luettin, J
Matthews, I
Glotin, H
Vergyri, D
2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2001, : 619 - 624
[38] A cascade information diffusion prediction model integrating topic features and cross-attention
Liu, Xiaoyang
Wang, Haotian
Bouyer, Asgarali
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (10)
[39] Cross-attention Based Text-image Transformer for Visual Question Answering
Rezapour M.
Recent Advances in Computer Science and Communications, 2024, 17 (04) : 72 - 78
[40] Multi-level Cross-attention Siamese Network For Visual Object Tracking
Zhang, Jianwei
Wang, Jingchao
Zhang, Huanlong
Miao, Mengen
Cai, Zengyu
Chen, Fuguo
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (12): : 3976 - 3990

← 1 2 3 4 5 →