Optimal Routing to Parallel Servers With Unknown Utilities-Multi-Armed Bandit With Queues

被引:2
|
作者
Fu, Xinzhe [1 ]
Modiano, Eytan [1 ]
机构
[1] MIT, Lab Informat & Decis Syst, Cambridge, MA 02139 USA
关键词
Queueing analysis; optimization methods; SHORTEST-QUEUE; JOIN; ALLOCATION; POWER; TIME;
D O I
10.1109/TNET.2022.3227136
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
consider the optimal routing problem in a discrete-time system with a job dispatcher connected to M parallel servers. At every time slot, the job dispatcher sends the incoming jobs to a server for execution, with each server having a queue that stores the jobs. The arrival process of incoming jobs, and the service processes of the servers are stochastic with unknown and possibly heterogeneous rates. Each server sm is associated with an underlying utility vm that is initially unknown. Whenever server sm completes a job, a utility of vm is obtained and a noisy observation of vm is received. The goal is to design a policy that makes routing decisions to maximize the total utility obtained by the end of a finite time horizon T. The performance of policies is measured in terms of regret, which is the additive difference between the expected total utility obtained by the policy and the supremum of the expected total utility over all the policies. The optimal routing problem can be interpreted as a problem of multi-armed bandit with queues where each server is viewed as an arm and the completion of a job is viewed as a pull of an arm. The key distinction between the optimal routing problem and traditional multi-armed bandit problems is in the queueing dynamics at the server, which arises due to the stochastic nature of the arrival and service processes. Our results combine techniques from control of stochastic queueing systems and stochastic multi-armed bandits to provide insights to the design and analysis of policies for the optimal routing problem. We first present analytical bounds that link the regret to the utilization and queue length of servers. Next, we start by assuming that the ordering of the underlying utilities is known and introduce the Priority-K routing policy which makes priority-based routing decisions that send the incoming jobs to the server of the highest underlying utility with queue length no larger than a threshold K. We prove that Priority-K achieves O (log T )-regret with an appropriately chosen K. Next, removing the assumption of known utility ordering, we pro -pose the Upper-Confidence Priority-K policy, which essentially combines the Priority-K policy with the ordering based on the upper-confidence bounds of the underlying utilities, and establish that the Upper-Confidence Priority-K policy achieves an instance-dependent O(log(3) T )-regret. Finally, we extend our results to the a generalized version of the optimal routing problem with multiple job dispatchers in a bipartite network. Our theoretical results are also validated by simulations.
引用
收藏
页码:1997 / 2012
页数:16
相关论文
共 50 条
  • [31] Fair routing in MoE for distributed spatial data: a combinatorial multi-armed bandit solution
    Fu, Yan
    Wang, Shasha
    Dong, Yucong
    Wu, Dan
    Wang, Juan
    Xu, Zichen
    GEOINFORMATICA, 2025,
  • [32] Dark-Pool Smart Order Routing: a Combinatorial Multi-armed Bandit Approach
    Bernasconi, Martino
    Martino, Stefano
    Vittori, Edoardo
    Trovo, Francesco
    Restelli, Marcello
    3RD ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2022, 2022, : 352 - 360
  • [33] Optimal Handover Policy for mmWave Cellular Networks: A Multi-Armed Bandit Approach
    Sun, Li
    Hou, Jing
    Shu, Tao
    2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [34] Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges
    Villar, Sofia S.
    Bowden, Jack
    Wason, James
    STATISTICAL SCIENCE, 2015, 30 (02) : 199 - 215
  • [35] A Multi-Armed Bandit Problem with the Optimal Arm Depending on a Hidden Markov Model
    Gulcu, Talha Cihad
    2021 IEEE INFORMATION THEORY WORKSHOP (ITW), 2021,
  • [36] Optimal data driven resource allocation under multi-armed bandit observations
    Burnetas, Apostolos N.
    Kanavetas, Odysseas
    Katehakis, Michael N.
    ANNALS OF OPERATIONS RESEARCH, 2025,
  • [37] Optimal Sensing via Multi-armed Bandit Relaxations in Mixed Observability Domains
    Lauri, Mikko
    Ritala, Risto
    2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2015, : 4807 - 4812
  • [38] Multi-Armed Bandit-Based Secure Routing in Air-Ground Integrated Networks
    Liu, Xiaoyuan
    Xu, Yang
    Liu, Jia
    Takakura, Hiroki
    Liu, Xiaoying
    Zheng, Kechen
    Shiratori, Norio
    2024 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC 2024, 2024,
  • [39] Optimal routing of customers to two parallel heterogeneous servers: The case of IHR service times
    Frostig, E
    Levikson, B
    OPERATIONS RESEARCH, 1999, 47 (03) : 438 - 444
  • [40] Comparison of multi-armed bandit algorithms for content request routing in cache-enabled networks
    Nii, Yusuke
    Tayuki, Ippei
    Hirata, Kouji
    2024 11TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN, ICCE-TAIWAN 2024, 2024, : 757 - 758