Reasoning with Language Model is Planning with World Model

被引:0
|
作者
Hao, Shibo [1 ]
Gu, Yi [1 ]
Ma, Haodi [2 ]
Hong, Joshua Jiahua [1 ]
Wang, Zhen [1 ,3 ]
Wang, Daisy Zhe [2 ]
Hu, Zhiting [1 ]
机构
[1] Univ Calif San Diego, La Jolla, CA 92093 USA
[2] Univ Florida, Gainesville, FL 32611 USA
[3] Mohamed bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have shown remarkable reasoning capabilities, particularly with chain-of-thought (CoT) prompting. However, LLMs sometimes still struggle with problems that are easy for humans, such as generating action plans to achieve given goals in an environment, or performing complex math or logical reasoning. The deficiency stems from the key fact that LLMs lack an internal world model to predict the world state (e.g., environment status, intermediate variable values) and simulate long-term outcomes of actions. This prevents LLMs from performing deliberate planning akin to human brains, which involves exploring alternative reasoning paths, anticipating future states and rewards, and iteratively refining existing reasoning steps. To overcome the limitations, we propose a new LLM reasoning framework, Reasoning via Planning (RAP). RAP repurposes the LLM as both a world model and a reasoning agent, and incorporates a principled planning algorithm based on Monte Carlo Tree Search for strategic exploration in the vast reasoning space. During reasoning, the LLM (as agent) incrementally builds a reasoning tree under the guidance of the LLM (as world model) and rewards, and efficiently obtains a high-reward reasoning path with a proper balance between exploration vs. exploitation. We apply RAP to various challenging reasoning problems including plan generation, math reasoning, and logical inference, and demonstrate its superiority over strong baselines. RAP with LLaMA-33B even surpasses CoT with GPT-4, achieving 33% relative improvement in a plan generation setting.(1)
引用
收藏
页码:8154 / 8173
页数:20
相关论文
共 50 条
  • [21] Extended object model for case-based reasoning in process planning
    Xu, Zhi-Wei
    Liu, Wen-Jian
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2002, 8 (02): : 115 - 121
  • [22] Strong modularity and circular reasoning pervade the planning-control model
    Ramenzoni, VC
    Riley, MA
    BEHAVIORAL AND BRAIN SCIENCES, 2004, 27 (01) : 48 - +
  • [23] Large language model-based evolutionary optimizer: Reasoning with elitism
    Brahmachary, Shuvayan
    Joshi, Subodh M.
    Panda, Aniruddha
    Koneripalli, Kaushik
    Sagotra, Arun Kumar
    Patel, Harshil
    Sharma, Ankush
    Jagtap, Ameya D.
    Kalyanaraman, Kaushic
    NEUROCOMPUTING, 2025, 622
  • [24] Democratizing Reasoning Ability: Tailored Learning from Large Language Model
    Wang, Zhaoyang
    Huang, Shaohan
    Liu, Yuxuan
    Wang, Jiahai
    Song, Minghui
    Zhang, Zihan
    Huang, Haizhen
    Wei, Furu
    Deng, Weiwei
    Sun, Feng
    Zhang, Qi
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1948 - 1966
  • [25] Large Language Model Influence on Diagnostic Reasoning A Randomized Clinical Trial
    Goh, Ethan
    Gallo, Robert
    Hom, Jason
    Strong, Eric
    Weng, Yingjie
    Kerman, Hannah
    Cool, Josephine A.
    Kanjee, Zahir
    Parsons, Andrew S.
    Ahuja, Neera
    Horvitz, Eric
    Yang, Daniel
    Milstein, Arnold
    Olson, Andrew P. J.
    Rodman, Adam
    Chen, Jonathan H.
    JAMA NETWORK OPEN, 2024, 7 (10)
  • [26] Distilling Multi-Step Reasoning Capabilities into Smaller Language Model
    Yim, Yauwai
    Wang, Zirui
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 530 - 535
  • [27] LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs
    Wang, Yan
    Chu, Zhixuan
    Ouyang, Xin
    Wang, Simeng
    Hao, Hongyan
    Shen, Yue
    Gu, Jinjie
    Xue, Siqiao
    Zhang, James
    Cui, Qing
    Li, Longfei
    Zhou, Jun
    Li, Sheng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19189 - 19196
  • [28] World Model as a Graph: Learning Latent Landmarks for Planning
    Zhang, Lunjun
    Yang, Ge
    Stadie, Bradly
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [29] Behavior planning of intelligent agent with sign world model
    Panov, Aleksandr I.
    BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES, 2017, 19 : 21 - 31
  • [30] A Model of Language-Group Interaction and Evolution Including Language Acquisition Planning
    Wyburn, John
    Hayward, John
    JOURNAL OF MATHEMATICAL SOCIOLOGY, 2010, 34 (03): : 167 - 200