Deconfounded Cross-modal Matching for Content-based Micro-video Background Music Recommendation

被引:0
|
作者
Yi, Jing [1 ]
Chen, Zhenzhong [1 ,2 ,3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Luoyu Rd 129, Wuhan 430079, Hubei, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Luoyu Rd 129, Wuhan 430079, Hubei, Peoples R China
[3] Hubei Luojia Lab, Luoyu Rd 129, Wuhan 430079, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal matching; debiased recommender systems; knowledge distillation; variational auto-encoder;
D O I
10.1145/3650042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object-oriented micro-video background music recommendation is a complicated task where the matching degree between videos and background music is a major issue. However, music selections in user-generated content ( UGC) are prone to selection bias caused by historical preferences of uploaders. Since historical preferences are not fully reliable and may reflect obsolete behaviors, over-reliance on them should be avoided as knowledge and interests dynamically evolve. In this article, we propose a Deconfounded Cross-Modal matching model to mitigate such bias. Specifically, uploaders' personal preferences of music genres are identified as confounders that spuriously correlate music embeddings and background music selections, causing the learned system to over-recommend music from majority groups. To resolve such confounders, backdoor adjustment is utilized to deconfound the spurious correlation between music embeddings and prediction scores. We further utilize Monte Carlo estimator with batch-level average as the approximations to avoid integrating the entire confounder space calculated by the adjustment. Furthermore, we design a teacher-student network to utilize the matching of music videos, which is professionally generated content (PGC) with specialized matching, to better recommend content-matching background music. The PGC data are modeled by a teacher network to guide the matching of uploader-selected UGC data of student network by KullbackLeibler-based knowledge transfer. Extensive experiments on the TT-150k-genre dataset demonstrate the effectiveness of the proposed method. The code is publicly available on https://github.com/jing- 1/DecCM
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Cross-Modal Variational Auto-Encoder for Content-Based Micro-Video Background Music Recommendation
    Yi, Jing
    Zhu, Yaochen
    Xie, Jiayi
    Chen, Zhenzhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 515 - 528
  • [2] Cross-Modal Music-Video Recommendation: A Study of Design Choices
    Pretet, Laure
    Richard, Gael
    Peeters, Geoffroy
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [3] Multi-modal Graph Contrastive Learning for Micro-video Recommendation
    Yi, Zixuan
    Wang, Xi
    Ounis, Iadh
    Macdonald, Craig
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1807 - 1811
  • [4] Multi-modal information augmented model for micro-video recommendation
    Huo Y.
    Jin B.
    Liao Z.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (06): : 1142 - 1152
  • [5] Cross-modal Pretraining and Matching for Video Understanding
    Wang, Limin
    MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 1 - 1
  • [6] Personalized Micro-video Recommendation Based on Multi-modal Features and User Interest Evolution
    Jin, Yingying
    Xu, Juan
    He, Xin
    IMAGE AND GRAPHICS, ICIG 2019, PT II, 2019, 11902 : 607 - 618
  • [7] A Micro-video Recommendation System Based on Big Data
    Shang, Songtao
    Shi, Minyong
    Shang, Wenqian
    Hong, Zhiguo
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 1213 - 1217
  • [8] Content-based multimedia information retrieval via cross-modal querying
    Li, MK
    Li, DG
    Dimitrova, N
    Sethi, IK
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL X, PROCEEDINGS: SYSTEMICS AND INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 141 - 145
  • [9] A STUDY ON CONTENT-BASED VIDEO RECOMMENDATION
    Li, Yan
    Wang, Hanjie
    Liu, Hailong
    Chen, Bo
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 4581 - 4585
  • [10] Content-Based Music-Image Retrieval Using Self- and Cross-Modal Feature Embedding Memory
    Nakatsuka, Takayuki
    Hamasaki, Masahiro
    Goto, Masataka
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2173 - 2183