Bid optimization using maximum entropy reinforcement learning

被引：3

作者：

Liu, Mengjuan ^{[1
]}

Liu, Jinyu ^{[1
]}

Hu, Zhengning ^{[1
]}

Ge, Yuchen ^{[1
]}

Nie, Xuyun ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Network & Data Secur Key Lab Sichuan Prov, Chengdu 610054, Peoples R China

来源：

NEUROCOMPUTING | 2022年 / 501卷

基金：

中国国家自然科学基金;

关键词：

Real-time bidding; Bidding strategy; Maximum entropy reinforcement learning;

D O I：

10.1016/j.neucom.2022.05.108

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Real-time bidding (RTB) has become a critical way for online advertising. It allows advertisers to display their ads by bidding on ad impressions. Therefore, advertisers in RTB always seek an optimal bidding strategy to improve their cost-efficiency. Unfortunately, it is challenging to optimize the bidding strategy at the granularity of impression due to the highly dynamic nature of the RTB environment. In this paper, we focus on optimizing the single advertiser's bidding strategy using a stochastic reinforcement learning (RL) algorithm. Firstly, we utilize a widely adopted linear bidding function to compute every impression's base price and optimize it with a mutable adjustment factor, thus making the bidding price conform to not only the impression's value to the advertiser but also the RTB environment. Secondly, we use the maximum entropy RL algorithm (Soft Actor-Critic) to optimize every impression's adjustment factor to overcome the deterministic RL algorithm's convergence problem. Finally, we evaluate the proposed strategy on a benchmark dataset (iPinYou), and the results demonstrate it obtained the most click numbers in 9 of 12 experiments compared to baselines. (c) 2022 Elsevier B.V. All rights reserved.

引用

页码：529 / 543

页数：15

共 50 条

[21] AdaBoost maximum entropy deep inverse reinforcement learning with truncated gradient
Song, Li
Li, Dazi
Wang, Xiao
Xu, Xin
INFORMATION SCIENCES, 2022, 602 : 328 - 350
[22] Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning
Bloem, Michael
Bambos, Nicholas
2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 4911 - 4916
[23] Adaptive Noise-based Evolutionary Reinforcement Learning With Maximum Entropy
Wang J.-Y.
Wang Z.
Li H.-X.
Chen C.-L.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (01): : 54 - 66
[24] Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning
Zhou, Zhengyuan
Bloem, Michael
Bambos, Nicholas
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (09) : 2787 - 2802
[25] Historical Decision-Making Regularized Maximum Entropy Reinforcement Learning
Dong, Botao
Huang, Longyang
Pang, Ning
Chen, Hongtian
Zhang, Weidong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[26] An effective maximum entropy exploration approach for deceptive game in reinforcement learning
Li, Chunmao
Wei, Xuanguang
Zhao, Yinliang
Geng, Xupeng
NEUROCOMPUTING, 2020, 403 : 98 - 108
[27] Maximum Power Point Tracking Based on Reinforcement Learning Using Evolutionary Optimization Algorithms
Bavarinos, Kostas
Dounis, Anastasios
Kofinas, Panagiotis
ENERGIES, 2021, 14 (02)
[28] Learning to Play Text-Based Adventure Games with Maximum Entropy Reinforcement Learning
Li, Weichen
Devidze, Rati
Fellenz, Sophie
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 39 - 54
[29] Maximum Entropy-Based Reinforcement Learning Using a Confidence Measure in Speech Recognition for Telephone Speech
Molina, Carlos
Becerra Yoma, Nestor
Huenupan, Fernando
Garreton, Claudio
Wuth, Jorge
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 1041 - 1052
[30] Maximum entropy-based optimal threshold selection using deterministic reinforcement learning with controlled randomization
Yin, PY
SIGNAL PROCESSING, 2002, 82 (07) : 993 - 1006

← 1 2 3 4 5 →