MDDL: A Framework for Reinforcement Learning-based Position Allocation in Multi-Channel Feed

被引：0

作者：

Shi, Xiaowen ^{[1
]}

Wang, Ze ^{[1
]}

Cai, Yuanying ^{[1
,2
]}

Wu, Xiaoxu ^{[1
]}

Yang, Fan ^{[1
]}

Liao, Guogang ^{[1
]}

Wang, Yongkang ^{[1
]}

Wang, Xingxing ^{[1
]}

Wang, Dong ^{[1
]}

机构：

[1] Meituan, Beijing, Peoples R China

[2] Tsinghua Univ, IIIS, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

关键词：

Reinforcement Learning; Multi-Distribution Data Learning; Position Allocation;

D O I：

10.1145/3539618.3592018

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nowadays, the mainstream approach in position allocation system is to utilize a reinforcement learning model to allocate appropriate locations for items in various channels and then mix them into the feed. There are two types of data employed to train reinforcement learning (RL) model for position allocation, named strategy data and random data. Strategy data is collected from the current online model, it suffers from an imbalanced distribution of state-action pairs, resulting in severe overestimation problems during training. On the other hand, random data offers a more uniform distribution of state-action pairs, but is challenging to obtain in industrial scenarios as it could negatively impact platform revenue and user experience due to random exploration. As the two types of data have different distributions, designing an effective strategy to leverage both types of data to enhance the efficacy of the RL model training has become a highly challenging problem. In this study, we propose a framework named Multi-Distribution Data Learning (MDDL) to address the challenge of effectively utilizing both strategy and random data for training RL models on mixed multi-distribution data. Specifically, MDDL incorporates a novel imitation learning signal to mitigate overestimation problems in strategy data and maximizes the RL signal for random data to facilitate effective learning. In our experiments, we evaluated the proposed MDDL framework in a real-world position allocation system and demonstrated its superior performance compared to the previous baseline. MDDL has been fully deployed on the Meituan food delivery platform and currently serves over 300 million users.

引用

页码：2159 / 2163

页数：5

共 50 条

[41] A novel machine learning-based framework for channel bandwidth allocation and optimization in distributed computing environments
Xu, Miaoxin
EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2023, 2023 (01)
[42] Learning-based Distributed Multi-channel Dynamic Access for Cellular Spectrum Sharing of Multiple Operators
Shin, Minsu
Chung, Min Young
PROCEEDINGS OF 2019 25TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS (APCC), 2019, : 384 - 387
[43] Channel capacity-based multi-channel allocation in cognitive radio networks
Lee, Juhyeon
Park, Hyung-Kun
Transactions of the Korean Institute of Electrical Engineers, 2013, 62 (12): : 1755 - 1757
[44] Learning Backoff: Deep Reinforcement Learning-Based Wireless Channel Access
Lee, Taegyeom
Jo, Ohyun
IEEE SYSTEMS JOURNAL, 2024, 18 (01): : 351 - 354
[45] Channel Allocation Evaluation for a multi-channel MAC protocol
Diab, Rana
Chalhoub, Gerard
Misson, Michel
2013 IEEE 24TH INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2013, : 1857 - 1862
[46] Channel allocation in multi-channel wireless mesh networks
Ding, Yong
Xiao, Li
COMPUTER COMMUNICATIONS, 2011, 34 (07) : 803 - 815
[47] LEARNING-BASED MULTI-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING A LOW-PARAMETER MODEL AND INTEGRATION WITH MVDR BEAMFORMING FOR MULTI-CHANNEL SPEECH ENHANCEMENT
Tao, Shuai
Mowlaee, Pejman
Jensen, Jesper Rindom
Christensen, Mads Graesboll
2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 100 - 104
[48] A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems
Lopes Silva, Maria Amelia
de Souza, Sergio Ricardo
Freitas Souza, Marcone Jamilson
Bazzan, Ana Lucia C.
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 131 : 148 - 171
[49] LEO Satellite Channel Allocation Scheme Based on Reinforcement Learning
Zheng, Fei
Pi, Zhao
Zhou, Zou
Wang, Kaixuan
MOBILE INFORMATION SYSTEMS, 2020, 2020
[50] A Learning-based Approach for Distributed Multi-Radio Channel Allocation in Wireless Mesh Networks
Pediaditaki, Sofia
Arrieta, Phillip
Marina, Mahesh K.
2009 17TH IEEE INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (ICNP 2009), 2009, : 31 - 41

← 1 2 3 4 5 →