Toward Multi-Modal Conditioned Fashion Image Translation

被引:13
|
作者
Gu, Xiaoling [1 ]
Yu, Jun [1 ]
Wong, Yongkang [2 ]
Kankanhalli, Mohan S. [2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Key Lab Complex Syst Modeling & Simulat, Hangzhou 310018, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore 119613, Singapore
基金
美国国家科学基金会; 新加坡国家研究基金会;
关键词
Generative adversarial network; fashion image synthesis; image-to-image translation; RETRIEVAL;
D O I
10.1109/TMM.2020.3009500
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Having the capability to synthesize photo-realistic fashion product images conditioned on multiple attributes or modalities would bring many new exciting applications. In this work, we propose an end-to-end network architecture that built upon a new generative adversarial network for automatically synthesizing photo-realistic images of fashion products under multiple conditions. Given an input pose image that consists of a 2D skeleton pose and a sentence description of products, our model synthesizes a fashion image preserving the same pose and wearing the fashion products described as the text. Specifically, the generator G tries to generate realistic-looking fashion images based on a < pose, text > pair condition to fool the discriminator. An attention network is added for enhancing the generator, which predicts a probability map indicating which part of the image needs to be attended for translation. In contrast, the discriminator D distinguishes real images from the translated ones based on the input pose image and text description. The discriminator is divided into two multi-scale sub-discriminators for improving image distinguishing task. Quantitative and qualitative analysis demonstrates that our method is capable of synthesizing realistic images that retain the poses of given images while matching the semantics of provided sentence descriptions.
引用
收藏
页码:2361 / 2371
页数:11
相关论文
共 50 条
  • [1] Multi-modal simultaneous machine translation fusion of image information
    Huang, Yan
    Wanga, Zhanyang
    Zhang, TianYuan
    Xu, Chun
    Lianga, Hui
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
  • [2] Unsupervised Multi-modal Medical Image Registration via Invertible Translation
    Guo, Mengjie
    COMPUTER VISION - ECCV 2024, PT XXXI, 2025, 15089 : 22 - 38
  • [3] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
    Arar, Moab
    Ginger, Yiftach
    Danon, Dov
    Bermano, Amit H.
    Cohen-Or, Daniel
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 13407 - 13416
  • [4] Swin transformer-based GAN for multi-modal medical image translation
    Yan, Shouang
    Wang, Chengyan
    Chen, Weibo
    Lyu, Jun
    FRONTIERS IN ONCOLOGY, 2022, 12
  • [5] IMAGE-ASSISTED TRANSFORMER IN ZERO-RESOURCE MULTI-MODAL TRANSLATION
    Huang, Ping
    Sun, Shiliang
    Yang, Hao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7548 - 7552
  • [6] Semi-supervised multi-modal medical image segmentation with unified translation
    Sun H.
    Wei J.
    Yuan W.
    Li R.
    Computers in Biology and Medicine, 2024, 176
  • [7] MULTI-MODAL JOINT EMBEDDING FOR FASHION PRODUCT RETRIEVAL
    Rubio, A.
    Yu, LongLong
    Simo-Serra, E.
    Moreno-Noguer, F.
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 400 - 404
  • [8] Multi-Modal Embedding for Main Product Detection in Fashion
    Rubio, Antonio
    Yu, LongLong
    Simo-Serra, Edgar
    Moreno-Noguer, Francesc
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2236 - 2242
  • [9] An error analysis for image-based multi-modal neural machine translation
    Calixto, Iacer
    Liu, Qun
    MACHINE TRANSLATION, 2019, 33 (1-2) : 155 - 177
  • [10] RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation
    Wang, Yan
    Zeng, Yawen
    Liang, Junjie
    Xing, Xiaofen
    Xu, Jin
    Xu, Xiangmin
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 860 - 868