Toward Multi-Modal Conditioned Fashion Image Translation

被引:13
|
作者
Gu, Xiaoling [1 ]
Yu, Jun [1 ]
Wong, Yongkang [2 ]
Kankanhalli, Mohan S. [2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Key Lab Complex Syst Modeling & Simulat, Hangzhou 310018, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore 119613, Singapore
基金
美国国家科学基金会; 新加坡国家研究基金会;
关键词
Generative adversarial network; fashion image synthesis; image-to-image translation; RETRIEVAL;
D O I
10.1109/TMM.2020.3009500
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Having the capability to synthesize photo-realistic fashion product images conditioned on multiple attributes or modalities would bring many new exciting applications. In this work, we propose an end-to-end network architecture that built upon a new generative adversarial network for automatically synthesizing photo-realistic images of fashion products under multiple conditions. Given an input pose image that consists of a 2D skeleton pose and a sentence description of products, our model synthesizes a fashion image preserving the same pose and wearing the fashion products described as the text. Specifically, the generator G tries to generate realistic-looking fashion images based on a < pose, text > pair condition to fool the discriminator. An attention network is added for enhancing the generator, which predicts a probability map indicating which part of the image needs to be attended for translation. In contrast, the discriminator D distinguishes real images from the translated ones based on the input pose image and text description. The discriminator is divided into two multi-scale sub-discriminators for improving image distinguishing task. Quantitative and qualitative analysis demonstrates that our method is capable of synthesizing realistic images that retain the poses of given images while matching the semantics of provided sentence descriptions.
引用
收藏
页码:2361 / 2371
页数:11
相关论文
共 50 条
  • [31] Multi-Modal Deformable Medical Image Registration
    Fookes, Clinton
    Sridharan, Sridha
    ICSPCS: 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, PROCEEDINGS, 2008, : 661 - 669
  • [32] MULTI-MODAL IMAGE STITCHING WITH NONLINEAR OPTIMIZATION
    Saha, Arindam
    Maity, Soumyadip
    Bhowmick, Brojeshwar
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1987 - 1991
  • [33] Multi-Modal Image Captioning for the Visually Impaired
    Ahsan, Hiba
    Bhalla, Nikita
    Bhatt, Daivat
    Shah, Kaivankumar
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 53 - 60
  • [34] Multi-modal Image Fusion with KNN Matting
    Zhang, Xia
    Lin, Hui
    Kang, Xudong
    Li, Shutao
    PATTERN RECOGNITION (CCPR 2014), PT II, 2014, 484 : 89 - 96
  • [35] MixBERT for Multi-modal Matching in Image Advertising
    Yu, Tan
    Li, Xiaokang
    Xie, Jianwen
    Yin, Ruiyang
    Xu, Qing
    Li, Ping
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3597 - 3602
  • [36] A Multi-modal SPM Model for Image Classification
    Zheng, Peng
    Zhao, Zhong-Qiu
    Gao, Jun
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 525 - 535
  • [37] An overview of multi-modal medical image fusion
    Du, Jiao
    Li, Weisheng
    Lu, Ke
    Xiao, Bin
    NEUROCOMPUTING, 2016, 215 : 3 - 20
  • [38] Multi-modal Learning for Social Image Classification
    Liu, Chunyang
    Zhang, Xu
    Li, Xiong
    Li, Rui
    Zhang, Xiaoming
    Chao, Wenhan
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1174 - 1179
  • [39] Self-supervised multi-modal fusion network for multi-modal thyroid ultrasound image diagnosis
    Xiang, Zhuo
    Zhuo, Qiuluan
    Zhao, Cheng
    Deng, Xiaofei
    Zhu, Ting
    Wang, Tianfu
    Jiang, Wei
    Lei, Baiying
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 150
  • [40] Reliable multi-modal medical image-to-image translation independent of pixel-wise aligned data
    Zhou, Langrui
    Li, Guang
    MEDICAL PHYSICS, 2024, 51 (11) : 8283 - 8301