Reasoning with Language Model is Planning with World Model

被引:0
|
作者
Hao, Shibo [1 ]
Gu, Yi [1 ]
Ma, Haodi [2 ]
Hong, Joshua Jiahua [1 ]
Wang, Zhen [1 ,3 ]
Wang, Daisy Zhe [2 ]
Hu, Zhiting [1 ]
机构
[1] Univ Calif San Diego, La Jolla, CA 92093 USA
[2] Univ Florida, Gainesville, FL 32611 USA
[3] Mohamed bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have shown remarkable reasoning capabilities, particularly with chain-of-thought (CoT) prompting. However, LLMs sometimes still struggle with problems that are easy for humans, such as generating action plans to achieve given goals in an environment, or performing complex math or logical reasoning. The deficiency stems from the key fact that LLMs lack an internal world model to predict the world state (e.g., environment status, intermediate variable values) and simulate long-term outcomes of actions. This prevents LLMs from performing deliberate planning akin to human brains, which involves exploring alternative reasoning paths, anticipating future states and rewards, and iteratively refining existing reasoning steps. To overcome the limitations, we propose a new LLM reasoning framework, Reasoning via Planning (RAP). RAP repurposes the LLM as both a world model and a reasoning agent, and incorporates a principled planning algorithm based on Monte Carlo Tree Search for strategic exploration in the vast reasoning space. During reasoning, the LLM (as agent) incrementally builds a reasoning tree under the guidance of the LLM (as world model) and rewards, and efficiently obtains a high-reward reasoning path with a proper balance between exploration vs. exploitation. We apply RAP to various challenging reasoning problems including plan generation, math reasoning, and logical inference, and demonstrate its superiority over strong baselines. RAP with LLaMA-33B even surpasses CoT with GPT-4, achieving 33% relative improvement in a plan generation setting.(1)
引用
收藏
页码:8154 / 8173
页数:20
相关论文
共 50 条
  • [41] Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine
    Thomas Savage
    Ashwin Nayak
    Robert Gallo
    Ekanath Rangan
    Jonathan H. Chen
    npj Digital Medicine, 7
  • [42] Textual Differential Privacy for Context-Aware Reasoning with Large Language Model
    Yu, Junwei
    Zhou, Jieyu
    Ding, Yepeng
    Zhang, Lingfeng
    Guo, Yuheng
    Sato, Hiroyuki
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 988 - 997
  • [43] Approximate reasoning about natural language: A certain distributional-mereological model
    Polkowski, LT
    Semeniuk-Polkowska, M
    MATHEMATICAL AND COMPUTATIONAL ANALYSIS OF NATURAL LANGUAGE, 1998, 45 : 239 - 252
  • [44] Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine
    Savage, Thomas
    Nayak, Ashwin
    Gallo, Robert
    Rangan, Ekanath
    Chen, Jonathan H.
    NPJ DIGITAL MEDICINE, 2024, 7 (01)
  • [45] PROPOSITIONAL REASONING BY MODEL
    JOHNSONLAIRD, PN
    SCHAEKEN, W
    BYRNE, RMJ
    PSYCHOLOGICAL REVIEW, 1992, 99 (03) : 418 - 439
  • [46] Large Language Model Ranker with Graph Reasoning for Zero-Shot Recommendation
    Zhang, Xuan
    Wei, Chunyu
    Yan, Ruyu
    Fan, Yushun
    Jia, Zhixuan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 356 - 370
  • [47] UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
    Masry, Ahmed
    Kavehzadeh, Parsa
    Do, Xuan Long
    Hoque, Enamul
    Joty, Shafiq
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 14662 - 14684
  • [48] Cognitive model of reasoning
    Tseptsov, VA
    Coiriere, P
    PSIKHOLOGICHESKII ZHURNAL, 1997, 18 (01) : 81 - +
  • [49] MODEL OF SCIENTIFIC REASONING
    MELZACK, R
    CANADIAN FAMILY PHYSICIAN, 1995, 41 : 9 - 12
  • [50] A MODEL FOR DISJUNCTIVE REASONING
    KRAUTH, J
    BERCHTOLDNEUMANN, M
    ZEITSCHRIFT FUR PSYCHOLOGIE, 1988, 196 (04): : 361 - 370