Deconfounded Cross-modal Matching for Content-based Micro-video Background Music Recommendation

被引:0
|
作者
Yi, Jing [1 ]
Chen, Zhenzhong [1 ,2 ,3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Luoyu Rd 129, Wuhan 430079, Hubei, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Luoyu Rd 129, Wuhan 430079, Hubei, Peoples R China
[3] Hubei Luojia Lab, Luoyu Rd 129, Wuhan 430079, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal matching; debiased recommender systems; knowledge distillation; variational auto-encoder;
D O I
10.1145/3650042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object-oriented micro-video background music recommendation is a complicated task where the matching degree between videos and background music is a major issue. However, music selections in user-generated content ( UGC) are prone to selection bias caused by historical preferences of uploaders. Since historical preferences are not fully reliable and may reflect obsolete behaviors, over-reliance on them should be avoided as knowledge and interests dynamically evolve. In this article, we propose a Deconfounded Cross-Modal matching model to mitigate such bias. Specifically, uploaders' personal preferences of music genres are identified as confounders that spuriously correlate music embeddings and background music selections, causing the learned system to over-recommend music from majority groups. To resolve such confounders, backdoor adjustment is utilized to deconfound the spurious correlation between music embeddings and prediction scores. We further utilize Monte Carlo estimator with batch-level average as the approximations to avoid integrating the entire confounder space calculated by the adjustment. Furthermore, we design a teacher-student network to utilize the matching of music videos, which is professionally generated content (PGC) with specialized matching, to better recommend content-matching background music. The PGC data are modeled by a teacher network to guide the matching of uploader-selected UGC data of student network by KullbackLeibler-based knowledge transfer. Extensive experiments on the TT-150k-genre dataset demonstrate the effectiveness of the proposed method. The code is publicly available on https://github.com/jing- 1/DecCM
引用
收藏
页数:25
相关论文
共 50 条
  • [31] VideoTopic: Content-based Video Recommendation Using a Topic Model
    Zhu, Qiusha
    Shyu, Mei-Ling
    Wang, Haohong
    2013 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2013, : 219 - 222
  • [32] VideoTopic: Modeling User Interests for Content-Based Video Recommendation
    Zhu, Qiusha
    Shyu, Mei-Ling
    Wang, Haohong
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2014, 5 (04): : 1 - 21
  • [33] Content-Based Video Recommendation System Based on Stylistic Visual Features
    Deldjoo, Yashar
    Elahi, Mehdi
    Cremonesi, Paolo
    Garzotto, Franca
    Piazzolla, Pietro
    Quadrana, Massimo
    JOURNAL ON DATA SEMANTICS, 2016, 5 (02) : 99 - 113
  • [34] A Dual-Path Cross-Modal Network for Video-Music Retrieval
    Gu, Xin
    Shen, Yinghua
    Lv, Chaohui
    SENSORS, 2023, 23 (02)
  • [35] AI-based Chinese-style music generation from video content: a study on cross-modal analysis and generation methods
    Cao, Moxi
    Zheng, Jiaxiang
    Zhang, Chongbin
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2025, 2025 (01):
  • [36] VIDEO-MUSIC RETRIEVAL WITH FINE-GRAINED CROSS-MODAL ALIGNMENT
    Era, Yuki
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2005 - 2009
  • [37] Adaptive Anti-Bottleneck Multi-Modal Graph Learning Network for Personalized Micro-video Recommendation
    Cai, Desheng
    Qian, Shengsheng
    Fang, Quan
    Hu, Jun
    Xu, Changsheng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [38] A Short Video Classification Framework Based on Cross-Modal Fusion
    Pang, Nuo
    Guo, Songlin
    Yan, Ming
    Chan, Chien Aun
    SENSORS, 2023, 23 (20)
  • [39] AFFECTIVE VIDEO CONTENT ANALYSES BY USING CROSS-MODAL EMBEDDING LEARNING FEATURES
    Li, Benchao
    Chen, Zhenzhong
    Li, Shan
    Zheng, Wei-Shi
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 844 - 849
  • [40] An Approach for Music Recommendation Using Content-based Analysis and Collaborative Filtering
    Kim, Jaekwang
    Kim, Kunsu
    You, Kwan-Ho
    Lee, Jee-Hyong
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (05): : 1985 - 1996