Unsupervised multi-modal modeling of fashion styles with visual attributes

被引:4
|
作者
Peng, Dunlu [1 ]
Liu, Rui [1 ]
Lu, Jing [1 ]
Zhang, Shuming [1 ]
机构
[1] Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 20093, Peoples R China
基金
中国国家自然科学基金;
关键词
Fashion style modeling; Convolutional neural network; Polylingual topic model; Machine learning; LATENT;
D O I
10.1016/j.asoc.2021.108214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fashion compatibility learning is of great practical significance to satisfy the needs of consumers and promote the development of the apparel industry. As a core task of it, fashion style modeling has received extensive attention. In this work, we apply a polylingual model, the PolyLDA, to discover the fashion style. To establish visual documents for fashion images, a pre-trained convolutional neural network, ResNet-50, which is trained on ImageNet, is employed in the model. The kernels in different layer of the network can encode different level of visual attributes (such as color, texture, pattern and etc.). Specifically, we can use a visual word (e.g., red, wavy, floral design and etc.) to express a particular kernel in a given layer. Therefore, to construct the visual document for a fashion image, all the kernels are directly treated as visual words and their activation is regarded as the appearance of the corresponding visual attribute. By minimizing the variance of style distribution on the training set given by PolyLDA, we train the weights of the visual attributes of each layer, and assign them to the visual attributes of different layers, so that the model can get better modeling ability than the comparative models. Our proposed method is completely unsupervised and cost saving. The experimental results show that the model can not only produce almost the same result as manual discrimination, but also achieve high satisfaction for similar style retrieval. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Unsupervised Multi-modal Learning
    Iqbal, Mohammed Shameer
    ADVANCES IN ARTIFICIAL INTELLIGENCE (AI 2015), 2015, 9091 : 343 - 346
  • [2] Fashion Compatibility Modeling through a Multi-modal Try-on-guided Scheme
    Dong, Xue
    Wu, Jianlong
    Song, Xuemeng
    Dai, Hongjun
    Nie, Liqiang
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 771 - 780
  • [3] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Yu, Jun
    Wu, Xiao-Jun
    Zhang, Donglin
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171
  • [4] Unsupervised Multi-modal Neural Machine Translation
    Su, Yuanhang
    Fan, Kai
    Nguyen Bach
    Kuo, C-C Jay
    Huang, Fei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10474 - 10483
  • [5] Deep Robust Unsupervised Multi-Modal Network
    Yang, Yang
    Wu, Yi-Feng
    Zhan, De-Chuan
    Liu, Zhi-Bin
    Jiang, Yuan
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5652 - 5659
  • [6] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Jun Yu
    Xiao-Jun Wu
    Donglin Zhang
    Cognitive Computation, 2022, 14 : 1159 - 1171
  • [7] Multi-modal Differentiable Unsupervised Feature Selection
    Yang, Junchen
    Lindenbaum, Ofir
    Kluger, Yuval
    Jaffe, Ariel
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2400 - 2410
  • [8] MULTI-MODAL JOINT EMBEDDING FOR FASHION PRODUCT RETRIEVAL
    Rubio, A.
    Yu, LongLong
    Simo-Serra, E.
    Moreno-Noguer, F.
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 400 - 404
  • [9] Multi-Modal Embedding for Main Product Detection in Fashion
    Rubio, Antonio
    Yu, LongLong
    Simo-Serra, Edgar
    Moreno-Noguer, Francesc
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2236 - 2242
  • [10] Toward Multi-Modal Conditioned Fashion Image Translation
    Gu, Xiaoling
    Yu, Jun
    Wong, Yongkang
    Kankanhalli, Mohan S.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2361 - 2371