ASVFI: AUDIO-DRIVEN SPEAKER VIDEO FRAME INTERPOLATION

被引：0

作者：

Wang, Qianrui ^{[1
]}

Li, Dengshi ^{[1
]}

Liao, Liang ^{[2
]}

Song, Hao ^{[1
]}

Li, Wei ^{[1
]}

Xiao, Jing ^{[3
]}

机构：

[1] Jianghan Univ, Sch Artificial Intelligence, Wuhan, Peoples R China

[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

[3] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Speaker video; video frame interpolation; audio;

D O I：

10.1109/ICIP49359.2023.10222345

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to limited data transmission, the video frame rate is low during the online conference, severely affecting user experience. Video frame interpolation can solve the problem by interpolating intermediate frames to increase the video frame rate. Generally, most existing video frame interpolation methods are based on the linear motion assumption. However, the mouth motion is nonlinear, and these methods can not generate superior intermediate frames in speaker video. Considering the strong correlation between mouth shape and vocalization, a new method is proposed, named Audio-driven Speaker Video Frame Interpolation(ASVFI). First, we extract the audio feature from Audio Net(ANet). Second, we use Video Net(VNet) encoder to extract the video feature. Finally, we fuse the audio and video features by AVFusion and decode out the intermediate frame in the VNet decoder. The experimental results show that the PSNR is nearly 0.13dB higher than the baseline of interpolating one frame. When interpolating seven frames, the PSNR is 0.33dB higher than the baseline.

引用

页码：3200 / 3204

页数：5

共 50 条

[31] Video Frame Interpolation Transformer
Shi, Zhihao
Xu, Xiangyu
Liu, Xiaohong
Chen, Jun
Yang, Ming-Hsuan
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17461 - 17470
[32] Audio2AB: Audio-driven collaborative generation of virtual character animation
Niu L.
Xie W.
Wang D.
Cao Z.
Liu X.
Virtual Reality and Intelligent Hardware, 6 (01): : 56 - 70
[33] Video Frame Interpolation with Transformer
Lu, Liying
Wu, Ruizheng
Lin, Huaijia
Lu, Jiangbo
Jia, Jiaya
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3522 - 3532
[34] Audio-Driven Violin Performance Animation with Clear Fingering and Bowing
Hirata, Asuka
Tanaka, Keitaro
Hamanaka, Masatoshi
Morishima, Shigeo
PROCEEDINGS OF SIGGRAPH 2022 POSTERS, SIGGRAPH 2022, 2022,
[35] Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
Huang, Ricong
Lai, Peiwen
Qin, Yipeng
Li, Guanbin
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12759 - 12768
[36] Audio-driven emotional speech animation for interactive virtual characters
Charalambous, Constantinos
Yumak, Zerrin
van der Stappen, A. Frank
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)
[37] Partial linear regresston for audio-driven talking head application
Hsieh, CK
Chen, YC
2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 281 - 284
[38] Spatially and Temporally Optimized Audio-Driven Talking Face Generation
Dong, Biao
Ma, Bo-Yao
Zhang, Lei
COMPUTER GRAPHICS FORUM, 2024, 43 (07)
[39] Audio2AB:Audio-driven collaborative generation of virtual character animation
Lichao NIU
Wenjun XIE
Dong WANG
Zhongrui CAO
Xiaoping LIU
虚拟现实与智能硬件(中英文), 2024, 6 (01) : 56 - 70
[40] Audio-Driven Stylized Gesture Generation with Flow-Based Model
Ye, Sheng
Wen, Yu-Hui
Sun, Yanan
He, Ying
Zhang, Ziyang
Wang, Yaoyuan
He, Weihua
Liu, Yong-Jin
COMPUTER VISION - ECCV 2022, PT V, 2022, 13665 : 712 - 728

← 1 2 3 4 5 →