Micro-video multi-label classification method based on multi-modal feature encoding

被引:0
|
作者
Jing P. [1 ]
Li Y. [1 ]
Su Y. [1 ]
机构
[1] School of Electrical and Information Engineering, Tianjin University, Tianjin
关键词
deep learning; micro-video; multi-label classification; multi-modal fusion; neural networks;
D O I
10.19665/j.issn1001-2400.2022.04.013
中图分类号
学科分类号
摘要
With the popularization of smart phones and the mobile Internet,micro-videos have been developed rapidly as a new form of user generated contents (UGCs).Browsing micro-videos has become one of the most popular entertainment methods.Micro-video has natural relevance in modalities and semantics.How to make full use of this correlation is the key to micro-video representation learning.Aiming at better solving multi-label classification tasks,a modal subspace encoding algorithm is proposed,which integrates subspace coding for multi-modal and label semantic relevance learning in a unified framework.The proposed algorithm uses the subspace coding network to obtain a complete common representation by modeling the consistency and complementary of modalities and meanwhile the redundancy and noise information are reduced further,so that the common and complete representations of multimodal fusion are obtained.Furthermore,the graph convolutional network is used to construct a label correlation matrix to learn the semantic relevance and representations of labels,which are used to guide the multi-label classification task.Overall,the proposed algorithm makes full use of feature-level and label-level information to improve classification performance.The reconstruction loss and multi-label classification loss are formulated as a whole and experiments on a public dataset have proved superiority of our proposed algorithm. © 2022 Science Press. All rights reserved.
引用
收藏
页码:109 / 117
页数:8
相关论文
共 27 条
  • [1] SAURA J R, BENNETT D R., A Three-Stage Method for Data Text Mining:Using UGC in Business Intelligence Analysis[J], Symmetry, 11, 4, (2019)
  • [2] LIU M, NIE L, WANG M, Et al., Towards Micro-Video Understanding by Joint Sequential-Sparse Modeling, Proceedings of ACM International Conference on Multimedia, pp. 970-978, (2017)
  • [3] JING P, SU Y, NIE L, Et al., Low-Rank Multi-View Embedding Learning for Micro-Video Popularity Prediction[J], IEEE Transactions on Knowledge and Data Engineering, 30, 8, pp. 1519-1532, (2017)
  • [4] CHEN X, LIU D, XIONG Z, Et al., Learning and Fusing Multiple User Interest Representations for Micro-Video and Movie Recommendations[J], IEEE Transactions on Multimedia, 23, pp. 484-496, (2020)
  • [5] JIA X, ZHENG X, LI W, Et al., Facial Emotion Distribution Learning by Exploiting Low-Rank Label Correlations Locally[C], Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 9841-9850, (2019)
  • [6] CHEN X, SONG X, REN R, Et al., Fine-Grained Privacy Detection with Graph-Regularized Hierarchical Attentive Representation Learning[J], ACM Transactions on Information Systems, 38, 4, pp. 1-26, (2020)
  • [7] D'MELLO S K, KORY J., A Review and Meta-Analysis of Multimodal Affect Detection Systems[J], ACM Computing Surveys, 47, 3, pp. 1-36, (2015)
  • [8] HARDOON D R, SZEDMAK S, SHAWE-TAYLOR J., Canonical Correlation Analysis:An Overview with Application to Learning Methods[J], Neural Computation, 16, 12, pp. 2639-2664, (2004)
  • [9] DANG Jisheng, YANG Jun, 3D Model Recognition and Segmentation Based on Multi-Feature Fusion[J], Journal of Xidian University, 47, 4, pp. 149-157, (2020)
  • [10] ZHANG C, FU H, ZHOU J T, Et al., CPM-Nets:Cross Partial Multi-View Networks, 33rd Conference on Neural Information Processing Systems, pp. 557-567, (2019)