Chinese Lip-Reading Research Based on ShuffleNet and CBAM

被引:10
|
作者
Fu, Yixian [1 ]
Lu, Yuanyao [1 ]
Ni, Ran [1 ]
机构
[1] North China Univ Technol, Sch Informat Sci & Technol, Beijing 100144, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 02期
基金
中国国家自然科学基金;
关键词
Chinese lip-reading; ShuffleNet; CBAM; light-weight network;
D O I
10.3390/app13021106
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset called 'Databox' previously built by our laboratory. Secondly, the current state-of-the-art model consists of a residual network and a temporal convolutional network. The residual network leads to excessive computational cost and is not suitable for the on-device applications. In the new model, the residual network is replaced with ShuffleNet, which is an extremely computation-efficient Convolutional Neural Network (CNN) architecture. Thirdly, to help the network focus on the most useful information, we insert a simple but effective attention module called Convolutional Block Attention Module (CBAM) into the ShuffleNet. In our experiment, we compare several model architectures and find that our model achieves a comparable accuracy to the residual network (3.5 GFLOPs) under the computational budget of 1.01 GFLOPs.
引用
收藏
页数:15
相关论文
共 50 条