ASVFI: AUDIO-DRIVEN SPEAKER VIDEO FRAME INTERPOLATION

被引:0
|
作者
Wang, Qianrui [1 ]
Li, Dengshi [1 ]
Liao, Liang [2 ]
Song, Hao [1 ]
Li, Wei [1 ]
Xiao, Jing [3 ]
机构
[1] Jianghan Univ, Sch Artificial Intelligence, Wuhan, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[3] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Wuhan, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年
关键词
Speaker video; video frame interpolation; audio;
D O I
10.1109/ICIP49359.2023.10222345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to limited data transmission, the video frame rate is low during the online conference, severely affecting user experience. Video frame interpolation can solve the problem by interpolating intermediate frames to increase the video frame rate. Generally, most existing video frame interpolation methods are based on the linear motion assumption. However, the mouth motion is nonlinear, and these methods can not generate superior intermediate frames in speaker video. Considering the strong correlation between mouth shape and vocalization, a new method is proposed, named Audio-driven Speaker Video Frame Interpolation(ASVFI). First, we extract the audio feature from Audio Net(ANet). Second, we use Video Net(VNet) encoder to extract the video feature. Finally, we fuse the audio and video features by AVFusion and decode out the intermediate frame in the VNet decoder. The experimental results show that the PSNR is nearly 0.13dB higher than the baseline of interpolating one frame. When interpolating seven frames, the PSNR is 0.33dB higher than the baseline.
引用
收藏
页码:3200 / 3204
页数:5
相关论文
共 50 条
  • [31] Video Frame Interpolation Transformer
    Shi, Zhihao
    Xu, Xiangyu
    Liu, Xiaohong
    Chen, Jun
    Yang, Ming-Hsuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17461 - 17470
  • [32] Audio2AB: Audio-driven collaborative generation of virtual character animation
    Niu L.
    Xie W.
    Wang D.
    Cao Z.
    Liu X.
    Virtual Reality and Intelligent Hardware, 6 (01): : 56 - 70
  • [33] Video Frame Interpolation with Transformer
    Lu, Liying
    Wu, Ruizheng
    Lin, Huaijia
    Lu, Jiangbo
    Jia, Jiaya
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3522 - 3532
  • [34] Audio-Driven Violin Performance Animation with Clear Fingering and Bowing
    Hirata, Asuka
    Tanaka, Keitaro
    Hamanaka, Masatoshi
    Morishima, Shigeo
    PROCEEDINGS OF SIGGRAPH 2022 POSTERS, SIGGRAPH 2022, 2022,
  • [35] Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
    Huang, Ricong
    Lai, Peiwen
    Qin, Yipeng
    Li, Guanbin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12759 - 12768
  • [36] Audio-driven emotional speech animation for interactive virtual characters
    Charalambous, Constantinos
    Yumak, Zerrin
    van der Stappen, A. Frank
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)
  • [37] Partial linear regresston for audio-driven talking head application
    Hsieh, CK
    Chen, YC
    2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 281 - 284
  • [38] Spatially and Temporally Optimized Audio-Driven Talking Face Generation
    Dong, Biao
    Ma, Bo-Yao
    Zhang, Lei
    COMPUTER GRAPHICS FORUM, 2024, 43 (07)
  • [39] Audio2AB:Audio-driven collaborative generation of virtual character animation
    Lichao NIU
    Wenjun XIE
    Dong WANG
    Zhongrui CAO
    Xiaoping LIU
    虚拟现实与智能硬件(中英文), 2024, 6 (01) : 56 - 70
  • [40] Audio-Driven Stylized Gesture Generation with Flow-Based Model
    Ye, Sheng
    Wen, Yu-Hui
    Sun, Yanan
    He, Ying
    Zhang, Ziyang
    Wang, Yaoyuan
    He, Weihua
    Liu, Yong-Jin
    COMPUTER VISION - ECCV 2022, PT V, 2022, 13665 : 712 - 728