Unsupervised multi-modal modeling of fashion styles with visual attributes

被引:4
|
作者
Peng, Dunlu [1 ]
Liu, Rui [1 ]
Lu, Jing [1 ]
Zhang, Shuming [1 ]
机构
[1] Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 20093, Peoples R China
基金
中国国家自然科学基金;
关键词
Fashion style modeling; Convolutional neural network; Polylingual topic model; Machine learning; LATENT;
D O I
10.1016/j.asoc.2021.108214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fashion compatibility learning is of great practical significance to satisfy the needs of consumers and promote the development of the apparel industry. As a core task of it, fashion style modeling has received extensive attention. In this work, we apply a polylingual model, the PolyLDA, to discover the fashion style. To establish visual documents for fashion images, a pre-trained convolutional neural network, ResNet-50, which is trained on ImageNet, is employed in the model. The kernels in different layer of the network can encode different level of visual attributes (such as color, texture, pattern and etc.). Specifically, we can use a visual word (e.g., red, wavy, floral design and etc.) to express a particular kernel in a given layer. Therefore, to construct the visual document for a fashion image, all the kernels are directly treated as visual words and their activation is regarded as the appearance of the corresponding visual attribute. By minimizing the variance of style distribution on the training set given by PolyLDA, we train the weights of the visual attributes of each layer, and assign them to the visual attributes of different layers, so that the model can get better modeling ability than the comparative models. Our proposed method is completely unsupervised and cost saving. The experimental results show that the model can not only produce almost the same result as manual discrimination, but also achieve high satisfaction for similar style retrieval. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Multi-modal visual tracking: Review and experimental comparison
    Pengyu Zhang
    Dong Wang
    Huchuan Lu
    Computational Visual Media, 2024, 10 : 193 - 214
  • [42] Multi-modal visual tracking: Review and experimental comparison
    Zhang, Pengyu
    Wang, Dong
    Lu, Huchuan
    COMPUTATIONAL VISUAL MEDIA, 2024, 10 (02) : 193 - 214
  • [43] Visual Hallucinations of Multi-modal Large Language Models
    Huang, Wen
    Liu, Hongbin
    Guo, Minxin
    Gong, Neil Zhenqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 9614 - 9631
  • [44] Multi-Modal Dynamic Graph Transformer for Visual Grounding
    Chen, Sijia
    Li, Baochun
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15513 - 15522
  • [45] Multi-modal visual tracking based on textual generation
    Wang, Jiahao
    Liu, Fang
    Jiao, Licheng
    Wang, Hao
    Li, Shuo
    Li, Lingling
    Chen, Puhua
    Liu, Xu
    INFORMATION FUSION, 2024, 112
  • [46] Generation of Visual Representations for Multi-Modal Mathematical Knowledge
    Wu, Lianlong
    Choi, Seewon
    Raggi, Daniel
    Stockdill, Aaron
    Garcia, Grecia Garcia
    Colarusso, Fiorenzo
    Cheng, Peter C. H.
    Jamnik, Mateja
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23850 - 23852
  • [47] Unsupervised Fact-finding with Multi-modal Data in Social Sensing
    Shao, Huajie
    Yao, Shuochao
    Zhao, Yiran
    Su, Lu
    Wang, Zhibo
    Liu, Dongxin
    Liu, Shengzhong
    Kaplan, Lance
    Abdelzaher, Tarek
    2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
  • [48] Evaluation of Random Field Models in Multi-modal Unsupervised Tampering Localization
    Korus, Pawel
    Huang, Jiwu
    2016 8TH IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS 2016), 2016,
  • [49] Unsupervised scene detection and commentator building using multi-modal chains
    Gert-Jan Poulisse
    Yorgos Patsis
    Marie-Francine Moens
    Multimedia Tools and Applications, 2014, 70 : 159 - 175
  • [50] Deep unsupervised multi-modal fusion network for detecting driver distraction
    Zhang, Yuxin
    Chen, Yiqiang
    Gao, Chenlong
    NEUROCOMPUTING, 2021, 421 : 26 - 38