Scripted Video Generation With a Bottom-Up Generative Adversarial Network

被引：14

作者：

Chen, Qi ^{[1
,2
]}

Wu, Qi ^{[3
]}

Chen, Jian ^{[1
]}

Wu, Qingyao ^{[1
]}

van den Hengel, Anton ^{[3
]}

Tan, Mingkui ^{[1
]}

机构：

[1] South China Univ Technol, Sch Software Engn, Guangzhou 510640, Peoples R China

[2] Pazhou Lab, Guangzhou 510335, Peoples R China

[3] Univ Adelaide, Sch Comp Sci, Adelaide, SA 5005, Australia

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2020年 / 29卷

基金：

中国国家自然科学基金;

关键词：

Generative adversarial networks; video generation; semantic alignment; temporal coherence;

D O I：

10.1109/TIP.2020.3003227

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generating videos given a text description (such as a script) is non-trivial due to the intrinsic complexity of image frames and the structure of videos. Although Generative Adversarial Networks (GANs) have been successfully applied to generate images conditioned on a natural language description, it is still very challenging to generate realistic videos in which the frames are required to follow both spatial and temporal coherence. In this paper, we propose a novel Bottom-up GAN (BoGAN) method for generating videos given a text description. To ensure the coherence of the generated frames and also make the whole video match the language descriptions semantically, we design a bottom-up optimisation mechanism to train BoGAN. Specifically, we devise a region-level loss via attention mechanism to preserve the local semantic alignment and draw details in different sub-regions of video conditioned on words which are most relevant to them. Moreover, to guarantee the matching between text and frame, we introduce a frame-level discriminator, which can also maintain the fidelity of each frame and the coherence across frames. Last, to ensure the global semantic alignment between whole video and given text, we apply a video-level discriminator. We evaluate the effectiveness of the proposed BoGAN on two synthetic datasets (i.e., SBMG and TBMG) and two real-world datasets (i.e., MSVD and KTH).

引用

页码：7454 / 7467

页数：14

共 50 条

[21] Signal Generation for Vibrotactile Display by Generative Adversarial Network
Agatsuma, Shotaro
Kurogi, Junya
Saga, Satoshi
Vasilache, Simona
Takahashi, Shin
HAPTIC INTERACTION: PERCEPTION, DEVICES AND ALGORITHMS, 2019, 535 : 58 - 60
[22] ObjectGraphs: Using Objects and a Graph Convolutional Network for the Bottom-up Recognition and Explanation of Events in Video
Gkalelis, Nikolaos
Goulas, Andreas
Galanopoulos, Damianos
Mezaris, Vasileios
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3370 - 3378
[23] Generative Adversarial Network for Joint Headline and Summary Generation
Lin, Ching-Sheng
Jwo, Jung-Sing
Lee, Cheng-Hsiung
Hsieh, Tsai-Feng
IEEE ACCESS, 2022, 10 : 90745 - 90751
[24] ViGAT: Bottom-Up Event Recognition and Explanation in Video Using Factorized Graph Attention Network
Gkalelis, Nikolaos
Daskalakis, Dimitrios
Mezaris, Vasileios
IEEE Access, 2022, 10 : 108797 - 108816
[25] OFDM Signal Generation Based on Generative Adversarial Network
Chen L.
Xu S.
Liu F.
Feng Q.
Liu C.
Xu F.
Tian M.
Liu G.
Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2023, 52 (06): : 841 - 850
[26] CTGGAN: Controllable Text Generation with Generative Adversarial Network
Yang, Zhe
Huang, Yi
Chen, Yaqin
Wu, Xiaoting
Feng, Junlan
Deng, Chao
APPLIED SCIENCES-BASEL, 2024, 14 (07):
[27] Sample generation based on residual generative adversarial network
Wang X.
Du W.
Chen J.
Chen H.-T.
Kongzhi yu Juece/Control and Decision, 2020, 35 (08): : 1887 - 1894
[28] Mushroom Phenotypic Generation Based on Generative Adversarial Network
Yuan P.
Wu M.
Zhai Z.
Yang C.
Xu H.
Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2019, 50 (12): : 231 - 239
[29] Generative adversarial network for Table-to-Text generation
Zhao, Jianyu
Zhan, Zhiqiang
Li, Tong
Li, Rang
Hu, Changjian
Wang, Siyun
Zhang, Yang
NEUROCOMPUTING, 2021, 452 : 28 - 36
[30] Stochastic bottom-up fixation prediction and saccade generation
Tavakoli, Hamed Rezazadegan
Rahtu, Esa
Heikkila, Janne
IMAGE AND VISION COMPUTING, 2013, 31 (09) : 686 - 693

← 1 2 3 4 5 →