Bid optimization using maximum entropy reinforcement learning

被引:3
|
作者
Liu, Mengjuan [1 ]
Liu, Jinyu [1 ]
Hu, Zhengning [1 ]
Ge, Yuchen [1 ]
Nie, Xuyun [1 ]
机构
[1] Univ Elect Sci & Technol China, Network & Data Secur Key Lab Sichuan Prov, Chengdu 610054, Peoples R China
基金
中国国家自然科学基金;
关键词
Real-time bidding; Bidding strategy; Maximum entropy reinforcement learning;
D O I
10.1016/j.neucom.2022.05.108
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-time bidding (RTB) has become a critical way for online advertising. It allows advertisers to display their ads by bidding on ad impressions. Therefore, advertisers in RTB always seek an optimal bidding strategy to improve their cost-efficiency. Unfortunately, it is challenging to optimize the bidding strategy at the granularity of impression due to the highly dynamic nature of the RTB environment. In this paper, we focus on optimizing the single advertiser's bidding strategy using a stochastic reinforcement learning (RL) algorithm. Firstly, we utilize a widely adopted linear bidding function to compute every impression's base price and optimize it with a mutable adjustment factor, thus making the bidding price conform to not only the impression's value to the advertiser but also the RTB environment. Secondly, we use the maximum entropy RL algorithm (Soft Actor-Critic) to optimize every impression's adjustment factor to overcome the deterministic RL algorithm's convergence problem. Finally, we evaluate the proposed strategy on a benchmark dataset (iPinYou), and the results demonstrate it obtained the most click numbers in 9 of 12 experiments compared to baselines. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:529 / 543
页数:15
相关论文
共 50 条
  • [21] AdaBoost maximum entropy deep inverse reinforcement learning with truncated gradient
    Song, Li
    Li, Dazi
    Wang, Xiao
    Xu, Xin
    INFORMATION SCIENCES, 2022, 602 : 328 - 350
  • [22] Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning
    Bloem, Michael
    Bambos, Nicholas
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 4911 - 4916
  • [23] Adaptive Noise-based Evolutionary Reinforcement Learning With Maximum Entropy
    Wang J.-Y.
    Wang Z.
    Li H.-X.
    Chen C.-L.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (01): : 54 - 66
  • [24] Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning
    Zhou, Zhengyuan
    Bloem, Michael
    Bambos, Nicholas
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (09) : 2787 - 2802
  • [25] Historical Decision-Making Regularized Maximum Entropy Reinforcement Learning
    Dong, Botao
    Huang, Longyang
    Pang, Ning
    Chen, Hongtian
    Zhang, Weidong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [26] An effective maximum entropy exploration approach for deceptive game in reinforcement learning
    Li, Chunmao
    Wei, Xuanguang
    Zhao, Yinliang
    Geng, Xupeng
    NEUROCOMPUTING, 2020, 403 : 98 - 108
  • [27] Maximum Power Point Tracking Based on Reinforcement Learning Using Evolutionary Optimization Algorithms
    Bavarinos, Kostas
    Dounis, Anastasios
    Kofinas, Panagiotis
    ENERGIES, 2021, 14 (02)
  • [28] Learning to Play Text-Based Adventure Games with Maximum Entropy Reinforcement Learning
    Li, Weichen
    Devidze, Rati
    Fellenz, Sophie
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 39 - 54
  • [29] Maximum Entropy-Based Reinforcement Learning Using a Confidence Measure in Speech Recognition for Telephone Speech
    Molina, Carlos
    Becerra Yoma, Nestor
    Huenupan, Fernando
    Garreton, Claudio
    Wuth, Jorge
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 1041 - 1052
  • [30] Maximum entropy-based optimal threshold selection using deterministic reinforcement learning with controlled randomization
    Yin, PY
    SIGNAL PROCESSING, 2002, 82 (07) : 993 - 1006