Probing for Bridging Inference in Transformer Language Models

被引:0
|
作者
Pandit, Onkar [1 ]
Hou, Yufang [2 ]
机构
[1] Univ Lille, CNRS, Cent Lille, INRIA Lille,UMR 9189,CRIStAL, F-59000 Lille, France
[2] IBM Res Europe, Dublin, Ireland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We probe pre-trained transformer language models for bridging inference. We first investigate individual attention heads in BERT and observe that attention heads at higher layers prominently focus on bridging relations incomparison with the lower and middle layers, also, few specific attention heads concentrate consistently on bridging. More importantly, we consider language models as a whole in our second approach where bridging anaphora resolution is formulated as a masked token prediction task (Of-Cloze test). Our formulation produces optimistic results without any fine-tuning, which indicates that pre-trained language models substantially capture bridging inference. Our further investigation shows that the distance between anaphor-antecedent and the context provided to language models play an important role in the inference.
引用
收藏
页码:4153 / 4163
页数:11
相关论文
共 50 条
  • [31] Code Template Inference Using Language Models
    Jacob, Ferosh
    Tairas, Robert
    PROCEEDINGS OF THE 48TH ANNUAL SOUTHEAST REGIONAL CONFERENCE (ACM SE 10), 2010, : 457 - 462
  • [32] Inference to the Best Explanation in Large Language Models
    Dalal, Dhairya
    Valentino, Marco
    Freitas, Andre
    Buitelaar, Paul
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 217 - 235
  • [33] Assessing Inference Time in Large Language Models
    Walkowiak, Bartosz
    Walkowiak, Tomasz
    SYSTEM DEPENDABILITY-THEORY AND APPLICATIONS, DEPCOS-RELCOMEX 2024, 2024, 1026 : 296 - 305
  • [34] Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models
    Narayanan, Deepak
    Santhanam, Keshav
    Henderson, Peter
    Bommasani, Rishi
    Lee, Tony
    Liang, Percy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] When the Edge Meets Transformers: Distributed Inference with Transformer Models
    Hu, Chenghao
    Li, Baochun
    2024 IEEE 44TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS 2024, 2024, : 82 - 92
  • [36] PPTIF: Privacy-Preserving Transformer Inference Framework for Language Translation
    Liu, Yanxin
    Su, Qianqian
    IEEE ACCESS, 2024, 12 : 48881 - 48897
  • [37] Language-Agnostic Bias Detection in Language Models with Bias Probing
    Koeksall, Abdullatif
    Yalcin, Omer Faruk
    Akbiyik, Ahmet
    Kilavuz, M. Tahir
    Korhonen, Anna
    Schutze, Hinrich
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12735 - 12747
  • [38] Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale
    Sartran, Laurent
    Barrett, Samuel
    Kuncoro, Adhiguna
    Stanojevic, Milos
    Blunsom, Phil
    Dyer, Chris
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1423 - 1439
  • [39] ProPILE: Probing Privacy Leakage in Large Language Models
    Kim, Siwon
    Yun, Sangdoo
    Lee, Hwaran
    Gubri, Martin
    Yoon, Sungroh
    Oh, Seong Joon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] Wave to Syntax: Probing spoken language models for syntax
    Shen, Gaofei
    Alishahi, Afra
    Bisazza, Arianna
    Chrupala, Grzegorz
    INTERSPEECH 2023, 2023, : 1259 - 1263