A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis

被引:0
|
作者
Stolfo, Alessandro [1 ]
Belinkov, Yonatan [2 ]
Sachan, Mrinmaya [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Technion IIT, Haifa, Israel
基金
以色列科学基金会; 瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mathematical reasoning in large language models (LMs) has garnered significant attention in recent work, but there is a limited understanding of how these models process and store information related to arithmetic tasks within their architecture. In order to improve our understanding of this aspect of language models, we present a mechanistic interpretation of Transformer-based LMs on arithmetic questions using a causal mediation analysis framework. By intervening on the activations of specific model components and measuring the resulting changes in predicted probabilities, we identify the subset of parameters responsible for specific predictions. This provides insights into how information related to arithmetic is processed by LMs. Our experimental results indicate that LMs process the input by transmitting the information relevant to the query from mid-sequence early layers to the final token using the attention mechanism. Then, this information is processed by a set of MLP modules, which generate result-related information that is incorporated into the residual stream. To assess the specificity of the observed activation dynamics, we compare the effects of different model components on arithmetic queries with other tasks, including number retrieval from prompts and factual knowledge questions.(1)
引用
收藏
页码:7035 / 7052
页数:18
相关论文
共 50 条
  • [1] Towards Analysis and Interpretation of Large Language Models for Arithmetic Reasoning
    Akter, Mst Shapna
    Shahriar, Hossain
    Cuzzocrea, Alfredo
    2024 11TH IEEE SWISS CONFERENCE ON DATA SCIENCE, SDS 2024, 2024, : 267 - 270
  • [2] Investigating Gender Bias in Language Models Using Causal Mediation Analysis
    Vig, Jesse
    Gehrmann, Sebastian
    Belinkov, Yonatan
    Qian, Sharon
    Nevo, Daniel
    Singer, Yaron
    Shieber, Stuart
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
    Hou, Yifan
    Li, Jiaoda
    Fei, Yu
    Stolfo, Alessandro
    Zhou, Wangchunshu
    Zeng, Guangtao
    Bosselut, Antoine
    Sachan, Mrinmaya
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4902 - 4919
  • [4] Causal Reasoning in Large Language Models using Causal Graph Retrieval Augmented Generation
    Samarajeewa, Chamod
    De Silva, Daswin
    Osipov, Evgeny
    Alahakoon, Damminda
    Manic, Milos
    2024 16TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION, HSI 2024, 2024,
  • [5] CLADDER: Assessing Causal Reasoning in Language Models
    Jin, Zhijing
    Chen, Yuen
    Leeb, Felix
    Gresele, Luigi
    Kamal, Ojasv
    Lyu, Zhiheng
    Blin, Kevin
    Gonzalez, Fernando
    Kleiman-Weiner, Max
    Sachan, Mrinmaya
    Schoelkopf, Bernhard
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Are Large Language Models Capable of Causal Reasoning for Sensing Data Analysis?
    Hu, Zhizhang
    Zhang, Yue
    Rossi, Ryan
    Yu, Tong
    Kim, Sungchul
    Pan, Shijia
    PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 24 - 29
  • [7] A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models
    Stolfo, Alessandro
    Jin, Zhijing
    Shridhar, Kumar
    Scholkopf, Bernhard
    Sachan, Mrinmaya
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 545 - 561
  • [8] Causal Models for Mediation Analysis: An Introduction to Structural Mean Models
    Zheng, Cheng
    Atkins, David C.
    Zhou, Xiao-Hua
    Rhew, Isaac C.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2015, 50 (06) : 614 - 631
  • [9] Causal reasoning using geometric analysis
    Kara, LB
    Stahovich, TF
    AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2002, 16 (05): : 363 - 384
  • [10] Identifying and Mitigating Annotation Bias in Natural Language Understanding using Causal Mediation Analysis
    Lim, Sitiporn Sae
    Udomcharoenchaikit, Can
    Limkonchotiwat, Peerat
    Chuangsuwanich, Ekapol
    Nutanong, Sarana
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11548 - 11563