Approval-directed agency and the decision theory of Newcomb-like problems

被引：2

作者：

Oesterheld, Caspar ^{[1
,2
]}

机构：

[1] Fdn Res Inst, Berlin, Germany

[2] Duke Univ, Durham, NC 27708 USA

来源：

SYNTHESE | 2021年 / 198卷 / SUPPL 27期

关键词：

Reinforcement learning; Causal decision theory; Evidential decision theory; Newcomb's problem; AI safety; Philosophical foundations of AI; CAUSAL;

D O I：

10.1007/s11229-019-02148-2

中图分类号：

N09 [自然科学史]; B [哲学、宗教];

学科分类号：

01 ; 0101 ; 010108 ; 060207 ; 060305 ; 0712 ;

摘要：

Decision theorists disagree about how instrumentally rational agents, i.e., agents trying to achieve some goal, should behave in so-called Newcomb-like problems, with the main contenders being causal and evidential decision theory. Since the main goal of artificial intelligence research is to create machines that make instrumentally rational decisions, the disagreement pertains to this field. In addition to the more philosophical question of what the right decision theory is, the goal of AI poses the question of how to implement any given decision theory in an AI. For example, how would one go about building an AI whose behavior matches evidential decision theory's recommendations? Conversely, we can ask which decision theories (if any) describe the behavior of any existing AI design. In this paper, we study what decision theory an approval-directed agent, i.e., an agent whose goal it is to maximize the score it receives from an overseer, implements. If we assume that the overseer rewards the agent based on the expected value of some von Neumann-Morgenstern utility function, then such an approval-directed agent is guided by two decision theories: the one used by the agent to decide which action to choose in order to maximize the reward and the one used by the overseer to compute the expected utility of a chosen action. We show which of these two decision theories describes the agent's behavior in which situations.

引用

页码：6491 / 6504

页数：14