共 50 条
- [3] Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies Soft Computing, 2019, 23 : 3591 - 3604
- [4] Polynomial-time reinforcement learning of near-optimal policies EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 205 - 210