Unsupervised multi-modal modeling of fashion styles with visual attributes

被引:4
|
作者
Peng, Dunlu [1 ]
Liu, Rui [1 ]
Lu, Jing [1 ]
Zhang, Shuming [1 ]
机构
[1] Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 20093, Peoples R China
基金
中国国家自然科学基金;
关键词
Fashion style modeling; Convolutional neural network; Polylingual topic model; Machine learning; LATENT;
D O I
10.1016/j.asoc.2021.108214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fashion compatibility learning is of great practical significance to satisfy the needs of consumers and promote the development of the apparel industry. As a core task of it, fashion style modeling has received extensive attention. In this work, we apply a polylingual model, the PolyLDA, to discover the fashion style. To establish visual documents for fashion images, a pre-trained convolutional neural network, ResNet-50, which is trained on ImageNet, is employed in the model. The kernels in different layer of the network can encode different level of visual attributes (such as color, texture, pattern and etc.). Specifically, we can use a visual word (e.g., red, wavy, floral design and etc.) to express a particular kernel in a given layer. Therefore, to construct the visual document for a fashion image, all the kernels are directly treated as visual words and their activation is regarded as the appearance of the corresponding visual attribute. By minimizing the variance of style distribution on the training set given by PolyLDA, we train the weights of the visual attributes of each layer, and assign them to the visual attributes of different layers, so that the model can get better modeling ability than the comparative models. Our proposed method is completely unsupervised and cost saving. The experimental results show that the model can not only produce almost the same result as manual discrimination, but also achieve high satisfaction for similar style retrieval. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Evaluation Method of Teaching Styles Based on Multi-modal Fusion
    Tang, Wen
    Wang, Chongwen
    Zhang, Yi
    2021 THE 7TH INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION PROCESSING, ICCIP 2021, 2021, : 9 - 15
  • [22] Unsupervised Trajectory Segmentation and Promoting of Multi-Modal Surgical Demonstrations
    Shao, Zhenzhou
    Zhao, Hongfa
    Xie, Jiexin
    Qu, Ying
    Guan, Yong
    Tan, Jindong
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 777 - 782
  • [23] Multi-modal unsupervised domain adaptation for semantic image segmentation
    Hu, Sijie
    Bonardi, Fabien
    Bouchafa, Samia
    Sidibe, Desire
    PATTERN RECOGNITION, 2023, 137
  • [24] Multi-Modal Joint Clustering With Application for Unsupervised Attribute Discovery
    Liu, Liangchen
    Nie, Feiping
    Wiliem, Arnold
    Li, Zhihui
    Zhang, Teng
    Lovell, Brian C.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (09) : 4345 - 4356
  • [25] Unsupervised Multi-modal Style Transfer for Cardiac MR Segmentation
    Chen, Chen
    Ouyang, Cheng
    Tarroni, Giacomo
    Schlemper, Jo
    Qiu, Huaqi
    Bai, Wenjia
    Rueckert, Daniel
    STATISTICAL ATLASES AND COMPUTATIONAL MODELS OF THE HEART: MULTI-SEQUENCE CMR SEGMENTATION, CRT-EPIGGY AND LV FULL QUANTIFICATION CHALLENGES, 2020, 12009 : 209 - 219
  • [26] Fast unsupervised multi-modal hashing based on piecewise learning
    Li, Yinan
    Long, Jun
    Tu, Zerong
    Yang, Zhan
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [27] Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval and Analysis
    Gu, Xiaoling
    Wong, Yongkang
    Shou, Lidan
    Peng, Pai
    Chen, Gang
    Kankanhalli, Mohan S.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (06) : 1524 - 1537
  • [28] Deep Collaborative Multi-Modal Learning for Unsupervised Kinship Estimation
    Dong, Guan-Nan
    Pun, Chi-Man
    Zhang, Zheng
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 4197 - 4210
  • [29] Multi-modal Preference Modeling for Product Search
    Guo, Yangyang
    Cheng, Zhiyong
    Nie, Liqiang
    Xu, Xin-Shun
    Kankanhalli, Mohan
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1865 - 1873
  • [30] MULTI-MODAL EAR AND FACE MODELING AND RECOGNITION
    Mahoor, Mohammad H.
    Cadavid, Steven
    Abdel-Mottaleb, Mohamed
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 4137 - +