Toward Multi-Modal Conditioned Fashion Image Translation

被引：13

作者：

Gu, Xiaoling ^{[1
]}

Yu, Jun ^{[1
]}

Wong, Yongkang ^{[2
]}

Kankanhalli, Mohan S. ^{[2
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Key Lab Complex Syst Modeling & Simulat, Hangzhou 310018, Peoples R China

[2] Natl Univ Singapore, Sch Comp, Singapore 119613, Singapore

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2021年 / 23卷

基金：

美国国家科学基金会; 新加坡国家研究基金会;

关键词：

Generative adversarial network; fashion image synthesis; image-to-image translation; RETRIEVAL;

D O I：

10.1109/TMM.2020.3009500

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Having the capability to synthesize photo-realistic fashion product images conditioned on multiple attributes or modalities would bring many new exciting applications. In this work, we propose an end-to-end network architecture that built upon a new generative adversarial network for automatically synthesizing photo-realistic images of fashion products under multiple conditions. Given an input pose image that consists of a 2D skeleton pose and a sentence description of products, our model synthesizes a fashion image preserving the same pose and wearing the fashion products described as the text. Specifically, the generator G tries to generate realistic-looking fashion images based on a < pose, text > pair condition to fool the discriminator. An attention network is added for enhancing the generator, which predicts a probability map indicating which part of the image needs to be attended for translation. In contrast, the discriminator D distinguishes real images from the translated ones based on the input pose image and text description. The discriminator is divided into two multi-scale sub-discriminators for improving image distinguishing task. Quantitative and qualitative analysis demonstrates that our method is capable of synthesizing realistic images that retain the poses of given images while matching the semantics of provided sentence descriptions.

引用

页码：2361 / 2371

页数：11

共 50 条

[21] FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German
Lefakis, Leonidas
Akbik, Alan
Vollgraf, Roland
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 451 - 454
[22] Attention-Aided Generative Learning for Multi-Scale Multi-Modal Fundus Image Translation
Pham, Van-Nguyen
Le, Duc-Tai
Bum, Junghyun
Lee, Eun Jung
Han, Jong Chul
Choo, Hyunseung
IEEE ACCESS, 2023, 11 : 51701 - 51711
[23] Unsupervised multi-modal modeling of fashion styles with visual attributes
Peng, Dunlu
Liu, Rui
Lu, Jing
Zhang, Shuming
APPLIED SOFT COMPUTING, 2022, 115
[24] Partial Modal Conditioned GANs for Multi-modal Multi-label Learning with Arbitrary Modal-Missing
Zhang, Yi
Shen, Jundong
Zhang, Zhecheng
Wang, Chongjun
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 413 - 428
[25] Cross-modal attention for multi-modal image registration
Song, Xinrui
Chao, Hanqing
Xu, Xuanang
Guo, Hengtao
Xu, Sheng
Turkbey, Baris
Wood, Bradford J.
Sanford, Thomas
Wang, Ge
Yan, Pingkun
MEDICAL IMAGE ANALYSIS, 2022, 82
[26] Contrastive Adversarial Training for Multi-Modal Machine Translation
Huang, Xin
Zhang, Jiajun
Zong, Chengqing
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
[27] Video Pivoting Unsupervised Multi-Modal Machine Translation
Li, Mingjie
Huang, Po-Yao
Chang, Xiaojun
Hu, Junjie
Yang, Yi
Hauptmann, Alex
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3918 - 3932
[28] Imaginations Generate Images for Multi-modal Machine Translation
Yang, Xiaona
Sun, Wenli
Wei, Wei
Li, Yinlin
Shi, Xiayang
PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 120 - 128
[29] Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval and Analysis
Gu, Xiaoling
Wong, Yongkang
Shou, Lidan
Peng, Pai
Chen, Gang
Kankanhalli, Mohan S.
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (06) : 1524 - 1537
[30] A variational approach to multi-modal image matching
Chefd'Hotel, C
Hermosillo, G
Faugeras, O
IEEE WORKSHOP ON VARIATIONAL AND LEVEL SET METHODS IN COMPUTER VISION, PROCEEDINGS, 2001, : 21 - 28

← 1 2 3 4 5 →