Unsupervised multi-modal modeling of fashion styles with visual attributes

被引:4
|
作者
Peng, Dunlu [1 ]
Liu, Rui [1 ]
Lu, Jing [1 ]
Zhang, Shuming [1 ]
机构
[1] Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 20093, Peoples R China
基金
中国国家自然科学基金;
关键词
Fashion style modeling; Convolutional neural network; Polylingual topic model; Machine learning; LATENT;
D O I
10.1016/j.asoc.2021.108214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fashion compatibility learning is of great practical significance to satisfy the needs of consumers and promote the development of the apparel industry. As a core task of it, fashion style modeling has received extensive attention. In this work, we apply a polylingual model, the PolyLDA, to discover the fashion style. To establish visual documents for fashion images, a pre-trained convolutional neural network, ResNet-50, which is trained on ImageNet, is employed in the model. The kernels in different layer of the network can encode different level of visual attributes (such as color, texture, pattern and etc.). Specifically, we can use a visual word (e.g., red, wavy, floral design and etc.) to express a particular kernel in a given layer. Therefore, to construct the visual document for a fashion image, all the kernels are directly treated as visual words and their activation is regarded as the appearance of the corresponding visual attribute. By minimizing the variance of style distribution on the training set given by PolyLDA, we train the weights of the visual attributes of each layer, and assign them to the visual attributes of different layers, so that the model can get better modeling ability than the comparative models. Our proposed method is completely unsupervised and cost saving. The experimental results show that the model can not only produce almost the same result as manual discrimination, but also achieve high satisfaction for similar style retrieval. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Multi-modal Correlation Modeling and Ranking for Retrieval
    Zhang, Hong
    Meng, Fanlian
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, 2009, 5879 : 637 - 646
  • [32] The multi-modal universe of fast-fashion: the Visuelle 2.0 benchmark
    Skenderi, Geri
    Joppi, Christian
    Denitto, Matteo
    Scarpa, Berniero
    Cristani, Marco
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2240 - 2245
  • [33] Principle-to-program: Neural Fashion Recommendation with Multi-modal Input
    Chelliah, Muthusamy
    Biswas, Soma
    Dhakad, Lucky
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2706 - 2708
  • [34] MM-FRec: Multi-Modal Enhanced Fashion Item Recommendation
    Song, Xuemeng
    Wang, Chun
    Sun, Changchang
    Feng, Shanshan
    Zhou, Min
    Nie, Liqiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10072 - 10084
  • [35] Dynamic Multi-modal Prompting for Efficient Visual Grounding
    Wu, Wansen
    Liu, Ting
    Wang, Youkai
    Xu, Kai
    Yin, Quanjun
    Hu, Yue
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 359 - 371
  • [36] Visual Entity Linking via Multi-modal Learning
    Zheng, Qiushuo
    Wen, Hao
    Wang, Meng
    Qi, Guilin
    DATA INTELLIGENCE, 2022, 4 (01) : 1 - 19
  • [37] Interactive Multi-Modal Display Spaces for Visual Analysis
    Marrinan, Thomas
    Rizzi, Silvio
    Nishimoto, Arthur
    Johnson, Andrew
    Insley, Joseph A.
    Papka, Michael E.
    PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE SURFACES AND SPACES, (ISS 2016), 2016, : 421 - 426
  • [38] Multi-Modal Hallucination Control by Visual Information Grounding
    Favero, Alessandro
    Zancato, Luca
    Trager, Matthew
    Choudhary, Siddharth
    Perera, Pramuditha
    Achille, Alessandro
    Swaminathan, Ashwin
    Soatto, Stefano
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14303 - 14312
  • [39] Multi-modal Pre-silicon Evaluation of Hardware Masking Styles
    Anik, Md Toufiq Hasan
    Reefat, Hasin Ishraq
    Cheng, Wei
    Danger, Jean-Luc
    Guilley, Sylvain
    Karimi, Naghmeh
    JOURNAL OF ELECTRONIC TESTING-THEORY AND APPLICATIONS, 2024, 40 (06): : 723 - 740
  • [40] A Study of Multi-modal Display System with Visual Feedback
    Tanikawa, Tomohiro
    Hirose, Michitaka
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 285 - 292