Toward Multi-Modal Conditioned Fashion Image Translation

被引:13
|
作者
Gu, Xiaoling [1 ]
Yu, Jun [1 ]
Wong, Yongkang [2 ]
Kankanhalli, Mohan S. [2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Key Lab Complex Syst Modeling & Simulat, Hangzhou 310018, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore 119613, Singapore
基金
美国国家科学基金会; 新加坡国家研究基金会;
关键词
Generative adversarial network; fashion image synthesis; image-to-image translation; RETRIEVAL;
D O I
10.1109/TMM.2020.3009500
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Having the capability to synthesize photo-realistic fashion product images conditioned on multiple attributes or modalities would bring many new exciting applications. In this work, we propose an end-to-end network architecture that built upon a new generative adversarial network for automatically synthesizing photo-realistic images of fashion products under multiple conditions. Given an input pose image that consists of a 2D skeleton pose and a sentence description of products, our model synthesizes a fashion image preserving the same pose and wearing the fashion products described as the text. Specifically, the generator G tries to generate realistic-looking fashion images based on a < pose, text > pair condition to fool the discriminator. An attention network is added for enhancing the generator, which predicts a probability map indicating which part of the image needs to be attended for translation. In contrast, the discriminator D distinguishes real images from the translated ones based on the input pose image and text description. The discriminator is divided into two multi-scale sub-discriminators for improving image distinguishing task. Quantitative and qualitative analysis demonstrates that our method is capable of synthesizing realistic images that retain the poses of given images while matching the semantics of provided sentence descriptions.
引用
收藏
页码:2361 / 2371
页数:11
相关论文
共 50 条
  • [21] FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German
    Lefakis, Leonidas
    Akbik, Alan
    Vollgraf, Roland
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 451 - 454
  • [22] Attention-Aided Generative Learning for Multi-Scale Multi-Modal Fundus Image Translation
    Pham, Van-Nguyen
    Le, Duc-Tai
    Bum, Junghyun
    Lee, Eun Jung
    Han, Jong Chul
    Choo, Hyunseung
    IEEE ACCESS, 2023, 11 : 51701 - 51711
  • [23] Unsupervised multi-modal modeling of fashion styles with visual attributes
    Peng, Dunlu
    Liu, Rui
    Lu, Jing
    Zhang, Shuming
    APPLIED SOFT COMPUTING, 2022, 115
  • [24] Partial Modal Conditioned GANs for Multi-modal Multi-label Learning with Arbitrary Modal-Missing
    Zhang, Yi
    Shen, Jundong
    Zhang, Zhecheng
    Wang, Chongjun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 413 - 428
  • [25] Cross-modal attention for multi-modal image registration
    Song, Xinrui
    Chao, Hanqing
    Xu, Xuanang
    Guo, Hengtao
    Xu, Sheng
    Turkbey, Baris
    Wood, Bradford J.
    Sanford, Thomas
    Wang, Ge
    Yan, Pingkun
    MEDICAL IMAGE ANALYSIS, 2022, 82
  • [26] Contrastive Adversarial Training for Multi-Modal Machine Translation
    Huang, Xin
    Zhang, Jiajun
    Zong, Chengqing
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [27] Video Pivoting Unsupervised Multi-Modal Machine Translation
    Li, Mingjie
    Huang, Po-Yao
    Chang, Xiaojun
    Hu, Junjie
    Yang, Yi
    Hauptmann, Alex
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3918 - 3932
  • [28] Imaginations Generate Images for Multi-modal Machine Translation
    Yang, Xiaona
    Sun, Wenli
    Wei, Wei
    Li, Yinlin
    Shi, Xiayang
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 120 - 128
  • [29] Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval and Analysis
    Gu, Xiaoling
    Wong, Yongkang
    Shou, Lidan
    Peng, Pai
    Chen, Gang
    Kankanhalli, Mohan S.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (06) : 1524 - 1537
  • [30] A variational approach to multi-modal image matching
    Chefd'Hotel, C
    Hermosillo, G
    Faugeras, O
    IEEE WORKSHOP ON VARIATIONAL AND LEVEL SET METHODS IN COMPUTER VISION, PROCEEDINGS, 2001, : 21 - 28