Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret

被引:204
|
作者
Anandkumar, Animashree [1 ]
Michael, Nithin [2 ]
Tang, Kevin [2 ]
Swami, Ananthram [3 ]
机构
[1] Univ Calif Irvine, Ctr Pervas Commun & Comp, Dept Elect Engn & Comp Sci, Irvine, CA 92697 USA
[2] Cornell Univ, Sch Elect & Comp Engn, Ithaca, NY 14853 USA
[3] USA, Res Lab, Adelphi, MD 20783 USA
关键词
Cognitive medium access control; multi-armed bandits; distributed algorithms; logarithmic regret; MULTIARMED BANDIT PROBLEM; EFFICIENT ALLOCATION RULES; MULTIPLE PLAYS; REWARDS;
D O I
10.1109/JSAC.2011.110406
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The problem of distributed learning and channel access is considered in a cognitive network with multiple secondary users. The availability statistics of the channels are initially unknown to the secondary users and are estimated using sensing decisions. There is no explicit information exchange or prior agreement among the secondary users and sensing and access decisions are undertaken by them in a completely distributed manner. We propose policies for distributed learning and access which achieve order-optimal cognitive system throughput (number of successful secondary transmissions) under self play, i.e., when implemented at all the secondary users. Equivalently, our policies minimize the sum regret in distributed learning and access, which is the loss in secondary throughput due to learning and distributed access. For the scenario when the number of secondary users is known to the policy, we prove that the total regret is logarithmic in the number of transmission slots. This policy achieves order-optimal regret based on a logarithmic lower bound for regret under any uniformly-good learning and access policy. We then consider the case when the number of secondary users is fixed but unknown, and is estimated at each user through feedback. We propose a policy whose sum regret grows only slightly faster than logarithmic in the number of transmission slots.
引用
收藏
页码:731 / 745
页数:15
相关论文
共 50 条
  • [1] Logarithmic-Regret Quantum Learning Algorithms for Zero-Sum Games
    Gao, Minbo
    Ji, Zhengfeng
    Li, Tongyang
    Wang, Qisheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Logarithmic regret algorithms for online convex optimization
    Elad Hazan
    Amit Agarwal
    Satyen Kale
    Machine Learning, 2007, 69 : 169 - 192
  • [3] Logarithmic regret algorithms for online convex optimization
    Hazan, Elad
    Agarwal, Amit
    Kale, Satyen
    MACHINE LEARNING, 2007, 69 (2-3) : 169 - 192
  • [4] Logarithmic regret algorithms for online convex optimization
    Hazan, Elad
    Kalai, Adam
    Kale, Satyen
    Agarwal, Amit
    LEARNING THEORY, PROCEEDINGS, 2006, 4005 : 499 - 513
  • [5] Q-learning with Logarithmic Regret
    Yang, Kunhe
    Yang, Lin F.
    Du, Simon S.
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [6] Reciprocal Learning for Cognitive Medium Access
    Chen, Xianfu
    Zhao, Zhifeng
    Grace, David
    Zhang, Honggang
    2013 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2013, : 89 - 94
  • [7] Distributed Adaptive Algorithms for Optimal Opportunistic Medium Access
    Al-Harthi, Yahya
    Borst, Sem
    2009 7TH INTERNATIONAL SYMPOSIUM ON MODELING AND OPTIMIZATION IN MOBILE, AD HOC, AND WIRELESS, 2009, : 346 - +
  • [8] Distributed Adaptive Algorithms for Optimal Opportunistic Medium Access
    Al-Harthi, Yahya
    Borst, Sem
    Whiting, Phil
    MOBILE NETWORKS & APPLICATIONS, 2011, 16 (02): : 217 - 230
  • [9] Distributed Adaptive Algorithms for Optimal Opportunistic Medium Access
    Yahya Al-Harthi
    Sem Borst
    Phil Whiting
    Mobile Networks and Applications, 2011, 16 : 217 - 230
  • [10] Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits
    Wu, Huasen
    Srikant, R.
    Liu, Xin
    Jiang, Chong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28