A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis

被引：0

作者：

Stolfo, Alessandro ^{[1
]}

Belinkov, Yonatan ^{[2
]}

Sachan, Mrinmaya ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] Technion IIT, Haifa, Israel

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年

基金：

以色列科学基金会; 瑞士国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mathematical reasoning in large language models (LMs) has garnered significant attention in recent work, but there is a limited understanding of how these models process and store information related to arithmetic tasks within their architecture. In order to improve our understanding of this aspect of language models, we present a mechanistic interpretation of Transformer-based LMs on arithmetic questions using a causal mediation analysis framework. By intervening on the activations of specific model components and measuring the resulting changes in predicted probabilities, we identify the subset of parameters responsible for specific predictions. This provides insights into how information related to arithmetic is processed by LMs. Our experimental results indicate that LMs process the input by transmitting the information relevant to the query from mid-sequence early layers to the final token using the attention mechanism. Then, this information is processed by a set of MLP modules, which generate result-related information that is incorporated into the residual stream. To assess the specificity of the observed activation dynamics, we compare the effects of different model components on arithmetic queries with other tasks, including number retrieval from prompts and factual knowledge questions.(1)

引用

页码：7035 / 7052

页数：18

共 50 条

[1] Towards Analysis and Interpretation of Large Language Models for Arithmetic Reasoning
Akter, Mst Shapna
Shahriar, Hossain
Cuzzocrea, Alfredo
2024 11TH IEEE SWISS CONFERENCE ON DATA SCIENCE, SDS 2024, 2024, : 267 - 270
[2] Investigating Gender Bias in Language Models Using Causal Mediation Analysis
Vig, Jesse
Gehrmann, Sebastian
Belinkov, Yonatan
Qian, Sharon
Nevo, Daniel
Singer, Yaron
Shieber, Stuart
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[3] Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
Hou, Yifan
Li, Jiaoda
Fei, Yu
Stolfo, Alessandro
Zhou, Wangchunshu
Zeng, Guangtao
Bosselut, Antoine
Sachan, Mrinmaya
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4902 - 4919
[4] Causal Reasoning in Large Language Models using Causal Graph Retrieval Augmented Generation
Samarajeewa, Chamod
De Silva, Daswin
Osipov, Evgeny
Alahakoon, Damminda
Manic, Milos
2024 16TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION, HSI 2024, 2024,
[5] CLADDER: Assessing Causal Reasoning in Language Models
Jin, Zhijing
Chen, Yuen
Leeb, Felix
Gresele, Luigi
Kamal, Ojasv
Lyu, Zhiheng
Blin, Kevin
Gonzalez, Fernando
Kleiman-Weiner, Max
Sachan, Mrinmaya
Schoelkopf, Bernhard
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Are Large Language Models Capable of Causal Reasoning for Sensing Data Analysis?
Hu, Zhizhang
Zhang, Yue
Rossi, Ryan
Yu, Tong
Kim, Sungchul
Pan, Shijia
PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 24 - 29
[7] A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models
Stolfo, Alessandro
Jin, Zhijing
Shridhar, Kumar
Scholkopf, Bernhard
Sachan, Mrinmaya
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 545 - 561
[8] Causal Models for Mediation Analysis: An Introduction to Structural Mean Models
Zheng, Cheng
Atkins, David C.
Zhou, Xiao-Hua
Rhew, Isaac C.
MULTIVARIATE BEHAVIORAL RESEARCH, 2015, 50 (06) : 614 - 631
[9] Causal reasoning using geometric analysis
Kara, LB
Stahovich, TF
AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2002, 16 (05): : 363 - 384
[10] Identifying and Mitigating Annotation Bias in Natural Language Understanding using Causal Mediation Analysis
Lim, Sitiporn Sae
Udomcharoenchaikit, Can
Limkonchotiwat, Peerat
Chuangsuwanich, Ekapol
Nutanong, Sarana
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11548 - 11563

← 1 2 3 4 5 →