A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis

被引:0
|
作者
Stolfo, Alessandro [1 ]
Belinkov, Yonatan [2 ]
Sachan, Mrinmaya [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Technion IIT, Haifa, Israel
基金
以色列科学基金会; 瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mathematical reasoning in large language models (LMs) has garnered significant attention in recent work, but there is a limited understanding of how these models process and store information related to arithmetic tasks within their architecture. In order to improve our understanding of this aspect of language models, we present a mechanistic interpretation of Transformer-based LMs on arithmetic questions using a causal mediation analysis framework. By intervening on the activations of specific model components and measuring the resulting changes in predicted probabilities, we identify the subset of parameters responsible for specific predictions. This provides insights into how information related to arithmetic is processed by LMs. Our experimental results indicate that LMs process the input by transmitting the information relevant to the query from mid-sequence early layers to the final token using the attention mechanism. Then, this information is processed by a set of MLP modules, which generate result-related information that is incorporated into the residual stream. To assess the specificity of the observed activation dynamics, we compare the effects of different model components on arithmetic queries with other tasks, including number retrieval from prompts and factual knowledge questions.(1)
引用
收藏
页码:7035 / 7052
页数:18
相关论文
共 50 条
  • [21] Using mechanistic models for the clinical interpretation of complex genomic variation
    María Peña-Chilet
    Marina Esteban-Medina
    Matias M. Falco
    Kinza Rian
    Marta R. Hidalgo
    Carlos Loucera
    Joaquín Dopazo
    Scientific Reports, 9
  • [22] Estimation of causal mediation effects for a dichotomous outcome in multiple-mediator models using the mediation formula
    Wang, Wei
    Nelson, Suchitra
    Albert, Jeffrey M.
    STATISTICS IN MEDICINE, 2013, 32 (24) : 4211 - 4228
  • [23] Commentary: Using Potential Outcomes to Understand Causal Mediation Analysis
    Imai, Kosuke
    Jo, Booil
    Stuart, Elizabeth A.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2011, 46 (05) : 861 - 873
  • [24] Estimating Causal Effects in Mediation Analysis Using Propensity Scores
    Coffman, Donna L.
    STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2011, 18 (03) : 357 - 369
  • [25] Using Large Language Models for the Interpretation of Building Regulations
    Fuchs, Stefan
    Witbrock, Michael
    Dimyadi, Johannes
    Amor, Robert
    Journal of Engineering, Project, and Production Management, 2024, 14 (04)
  • [26] Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models
    Finlayson, Matthew
    Mueller, Aaron
    Gehrmann, Sebastian
    Shieber, Stuart
    Linzen, Tal
    Belinkov, Yonatan
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1828 - 1843
  • [27] Assessing the translatability of In vivo cardiotoxicity mechanisms to In vitro models using causal reasoning
    Enayetallah, Ahmed E.
    Puppala, Dinesh
    Ziemek, Daniel
    Fischer, James E.
    Kantesaria, Sheila
    Pletcher, Mathew T.
    BMC PHARMACOLOGY & TOXICOLOGY, 2013, 14
  • [28] Assessing the translatability of In vivo cardiotoxicity mechanisms to In vitro models using causal reasoning
    Ahmed E Enayetallah
    Dinesh Puppala
    Daniel Ziemek
    James E Fischer
    Sheila Kantesaria
    Mathew T Pletcher
    BMC Pharmacology and Toxicology, 14
  • [29] Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition
    Muffo, Matteo
    Cocco, Aldo
    Bertino, Enrico
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 291 - 297
  • [30] Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training
    Da, Yifei
    Bossa, Matias Nicolas
    Berenguer, Abel Diaz
    Sahli, Hichem
    IEEE ACCESS, 2024, 12 : 10120 - 10134