Scripted Video Generation With a Bottom-Up Generative Adversarial Network

被引:14
|
作者
Chen, Qi [1 ,2 ]
Wu, Qi [3 ]
Chen, Jian [1 ]
Wu, Qingyao [1 ]
van den Hengel, Anton [3 ]
Tan, Mingkui [1 ]
机构
[1] South China Univ Technol, Sch Software Engn, Guangzhou 510640, Peoples R China
[2] Pazhou Lab, Guangzhou 510335, Peoples R China
[3] Univ Adelaide, Sch Comp Sci, Adelaide, SA 5005, Australia
基金
中国国家自然科学基金;
关键词
Generative adversarial networks; video generation; semantic alignment; temporal coherence;
D O I
10.1109/TIP.2020.3003227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating videos given a text description (such as a script) is non-trivial due to the intrinsic complexity of image frames and the structure of videos. Although Generative Adversarial Networks (GANs) have been successfully applied to generate images conditioned on a natural language description, it is still very challenging to generate realistic videos in which the frames are required to follow both spatial and temporal coherence. In this paper, we propose a novel Bottom-up GAN (BoGAN) method for generating videos given a text description. To ensure the coherence of the generated frames and also make the whole video match the language descriptions semantically, we design a bottom-up optimisation mechanism to train BoGAN. Specifically, we devise a region-level loss via attention mechanism to preserve the local semantic alignment and draw details in different sub-regions of video conditioned on words which are most relevant to them. Moreover, to guarantee the matching between text and frame, we introduce a frame-level discriminator, which can also maintain the fidelity of each frame and the coherence across frames. Last, to ensure the global semantic alignment between whole video and given text, we apply a video-level discriminator. We evaluate the effectiveness of the proposed BoGAN on two synthetic datasets (i.e., SBMG and TBMG) and two real-world datasets (i.e., MSVD and KTH).
引用
收藏
页码:7454 / 7467
页数:14
相关论文
共 50 条
  • [21] Signal Generation for Vibrotactile Display by Generative Adversarial Network
    Agatsuma, Shotaro
    Kurogi, Junya
    Saga, Satoshi
    Vasilache, Simona
    Takahashi, Shin
    HAPTIC INTERACTION: PERCEPTION, DEVICES AND ALGORITHMS, 2019, 535 : 58 - 60
  • [22] ObjectGraphs: Using Objects and a Graph Convolutional Network for the Bottom-up Recognition and Explanation of Events in Video
    Gkalelis, Nikolaos
    Goulas, Andreas
    Galanopoulos, Damianos
    Mezaris, Vasileios
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3370 - 3378
  • [23] Generative Adversarial Network for Joint Headline and Summary Generation
    Lin, Ching-Sheng
    Jwo, Jung-Sing
    Lee, Cheng-Hsiung
    Hsieh, Tsai-Feng
    IEEE ACCESS, 2022, 10 : 90745 - 90751
  • [24] ViGAT: Bottom-Up Event Recognition and Explanation in Video Using Factorized Graph Attention Network
    Gkalelis, Nikolaos
    Daskalakis, Dimitrios
    Mezaris, Vasileios
    IEEE Access, 2022, 10 : 108797 - 108816
  • [25] OFDM Signal Generation Based on Generative Adversarial Network
    Chen L.
    Xu S.
    Liu F.
    Feng Q.
    Liu C.
    Xu F.
    Tian M.
    Liu G.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2023, 52 (06): : 841 - 850
  • [26] CTGGAN: Controllable Text Generation with Generative Adversarial Network
    Yang, Zhe
    Huang, Yi
    Chen, Yaqin
    Wu, Xiaoting
    Feng, Junlan
    Deng, Chao
    APPLIED SCIENCES-BASEL, 2024, 14 (07):
  • [27] Sample generation based on residual generative adversarial network
    Wang X.
    Du W.
    Chen J.
    Chen H.-T.
    Kongzhi yu Juece/Control and Decision, 2020, 35 (08): : 1887 - 1894
  • [28] Mushroom Phenotypic Generation Based on Generative Adversarial Network
    Yuan P.
    Wu M.
    Zhai Z.
    Yang C.
    Xu H.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2019, 50 (12): : 231 - 239
  • [29] Generative adversarial network for Table-to-Text generation
    Zhao, Jianyu
    Zhan, Zhiqiang
    Li, Tong
    Li, Rang
    Hu, Changjian
    Wang, Siyun
    Zhang, Yang
    NEUROCOMPUTING, 2021, 452 : 28 - 36
  • [30] Stochastic bottom-up fixation prediction and saccade generation
    Tavakoli, Hamed Rezazadegan
    Rahtu, Esa
    Heikkila, Janne
    IMAGE AND VISION COMPUTING, 2013, 31 (09) : 686 - 693