Enhancing Dynamic Image Advertising with Vision-Language Pre-training

被引:4
|
作者
Wen, Zhoufutu [1 ]
Zhao, Xinyu [2 ,3 ]
Jin, Zhipeng [1 ]
Yang, Yi [1 ]
Jia, Wei [1 ]
Chen, Xiaodong [1 ]
Li, Shuanglong [1 ]
Liu, Lin [1 ]
机构
[1] Baidu Inc, Baidu Search Ads, Beijing, Peoples R China
[2] Peking Univ, Beijing, Peoples R China
[3] Baidu Search Ads, Beijing, Peoples R China
关键词
cross-modal retrieval; search advertising; image retrieval;
D O I
10.1145/3539618.3591844
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the multimedia era, image is an effective medium in search advertising. Dynamic Image Advertising (DIA), a system that matches queries with ad images and generates multimodal ads, is introduced to improve user experience and ad revenue. The core of DIA is a query-image matching module performing ad image retrieval and relevance modeling. Current query-image matching suffers from limited and inconsistent data, and insufficient cross-modal interaction. Also, the separate optimization of retrieval and relevance models affects overall performance. To address this issue, we propose a vision-language framework consisting of two parts. First, we train a base model on large-scale image-text pairs to learn general multimodal representation. Then, we fine-tune the base model on advertising business data, unifying relevance modeling and retrieval through multi-objective learning. Our framework has been implemented in Baidu search advertising system "Phoneix Nest". Online evaluation shows that it improves cost per mille (CPM) and click-through rate (CTR) by 1.04% and 1.865%.
引用
收藏
页码:3310 / 3314
页数:5
相关论文
共 50 条
  • [31] Multimodal Pre-training Method for Vision-language Understanding and Generation
    Liu T.-Y.
    Wu Z.-X.
    Chen J.-J.
    Jiang Y.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2024 - 2034
  • [32] Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
    Cheng, Kanzhi
    Song, Wenpo
    Ma, Zheng
    Zhu, Wenhao
    Zhu, Zixuan
    Zhang, Jianbing
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5038 - 5047
  • [33] Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided Text Prompts
    Wang, Alex Jinpeng
    Zhou, Pan
    Shou, Mike Zheng
    Yan, Shuicheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3406 - 3421
  • [34] Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-training
    Chen, Xiaofei
    He, Yuting
    Xue, Cheng
    Ge, Rongjun
    Li, Shuo
    Yang, Guanyu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 405 - 415
  • [35] Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
    Dou, Zi-Yi
    Kamath, Aishwarya
    Gan, Zhe
    Zhang, Pengchuan
    Wang, Jianfeng
    Li, Linjie
    Liu, Zicheng
    Liu, Ce
    LeCun, Yann
    Peng, Nanyun
    Gao, Jianfeng
    Wang, Lijuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [36] Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends
    Gan, Zhe
    Li, Linjie
    Li, Chunyuan
    Wang, Lijuan
    Liu, Zicheng
    Gao, Jianfeng
    FOUNDATIONS AND TRENDS IN COMPUTER GRAPHICS AND VISION, 2022, 14 (3-4): : 163 - 352
  • [37] Position-guided Text Prompt for Vision-Language Pre-training
    Wang, Jinpeng
    Zhou, Pan
    Shou, Mike Zheng
    Yan, Shuicheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23242 - 23251
  • [38] Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
    Zhuge, Mingchen
    Gao, Dehong
    Fan, Deng-Ping
    Jin, Linbo
    Chen, Ben
    Zhou, Haoming
    Qiu, Minghui
    Shao, Ling
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12642 - 12652
  • [39] IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-Training
    Liu, Che
    Cheng, Sibo
    Shi, Miaojing
    Shah, Anand
    Bai, Wenjia
    Arcucci, Rossella
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (01) : 519 - 529
  • [40] Subsampling of Frequent Words in Text for Pre-training a Vision-Language Model
    Liang, Mingliang
    Larson, Martha
    PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 61 - 67