Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space

被引:262
作者
Anh Nguyen [1 ]
Clune, Jeff [2 ]
Bengio, Yoshua [3 ]
Dosovitskiy, Alexey [4 ]
Yosinski, Jason [5 ]
机构
[1] Univ Wyoming, Laramie, WY 82071 USA
[2] Univ Wyoming, Uber AI Labs, Laramie, WY 82071 USA
[3] Montreal Inst Learning Algorithms, Montreal, PQ, Canada
[4] Univ Freiburg, Freiburg, Germany
[5] Uber AI Labs, Laramie, WY USA
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR.2017.374
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating high-resolution, photo-realistic images has been a long-standing goal in machine learning. Recently, Nguyen et al. [37] showed one interesting way to synthesize novel images by performing gradient ascent in the latent space of a generator network to maximize the activations of one or multiple neurons in a separate classifier network. In this paper we extend this method by introducing an additional prior on the latent code, improving both sample quality and sample diversity, leading to a state-of-the-art generative model that produces high quality images at higher resolutions (227 x 227) than previous generative models, and does so for all 1000 ImageNet categories. In addition, we provide a unified probabilistic interpretation of related activation maximization methods and call the general class of models "Plug and Play Generative Networks." PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw. We demonstrate the generation of images conditioned on a class (when C is an ImageNet or MIT Places classification network) and also conditioned on a caption (when C is an image captioning network). Our method also improves the state of the art of Multifaceted Feature Visualization [40], which generates the set of synthetic inputs that activate a neuron in order to better understand how deep neural networks operate. Finally, we show that our model performs reasonably well at the task of image inpainting. While image models are used in this paper, the approach is modality-agnostic and can be applied to many types of data.
引用
收藏
页码:3510 / 3520
页数:11
相关论文
共 66 条
[1]  
Alain G, 2014, J MACH LEARN RES, V15, P3563
[2]  
[Anonymous], 2015, P IEEE C COMP VIS PA
[3]  
[Anonymous], GOOGLE RES BLOG
[4]  
[Anonymous], IMAGE SYNTHESIS YAHO
[5]  
[Anonymous], 2016, Deep learning
[6]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[7]  
[Anonymous], 2009, TECHNICAL REPORT
[8]  
[Anonymous], DEEP LEARN WORKSH IN
[9]  
[Anonymous], 2013, PMLR
[10]  
[Anonymous], 2016, CORR