Dissecting Contextual Word Embeddings: Architecture and Representation

被引:0
|
作者
Peters, Matthew E. [1 ]
Neumann, Mark [1 ]
Zettlemoyer, Luke [2 ]
Yih, Wen-tau [1 ]
机构
[1] Allen Inst Artificial Intelligence, Seattle, WA 98103 USA
[2] Univ Washington, Paul G Allen Comp Sci & Engn, Seattle, WA 98195 USA
来源
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018) | 2018年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.
引用
收藏
页码:1499 / 1509
页数:11
相关论文
共 50 条
  • [31] The Role of Contextual Word Embeddings in Correcting the 'de/da' Clitic Errors in Turkish
    Ozturk, Hasan
    Degirmenci, Alperen
    Gungor, Onur
    Uskudarli, Suzan
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [32] Character-to-Word Representation and Global Contextual Representation for Named Entity Recognition
    Chang, Jun
    Han, Xiaohong
    NEURAL PROCESSING LETTERS, 2023, 55 (07) : 8551 - 8567
  • [33] Character-to-Word Representation and Global Contextual Representation for Named Entity Recognition
    Jun Chang
    Xiaohong Han
    Neural Processing Letters, 2023, 55 : 8551 - 8567
  • [34] Text Representation Models based on the Spatial Distributional Properties of Word Embeddings
    Unnam, Narendra Babu
    Reddy, P. Krishna
    Pandey, Amit
    Manwani, Naresh
    PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 603 - 604
  • [35] Beyond Accuracy: Measuring Representation Capacity of Embeddings to Preserve Structural and Contextual Information
    Ali, Sarwan
    INFORMATION MANAGEMENT AND BIG DATA, SIMBIG 2023, 2024, 2142 : 30 - 45
  • [36] A Hierarchical Book Representation of Word Embeddings for Effective Semantic Clustering and Search
    Bleiweiss, Avi
    ICAART: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2017, : 154 - 163
  • [37] Improved Word Sense Disambiguation via Prompt-based Contextual Word Representation
    He, Qipeng
    Zhang, Jian
    Huang, Xueting
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [38] Sentiment Classification with Medical Word Embeddings and Sequence Representation for Drug Reviews
    Liu, Sisi
    Lee, Ickjai
    HEALTH INFORMATION SCIENCE (HIS 2018), 2018, 11148 : 75 - 86
  • [39] Learning distributed word representation with multi-contextual mixed embedding
    Li, Jianqiang
    Li, Jing
    Fu, Xianghua
    Masud, M. A.
    Huang, Joshua Zhexue
    KNOWLEDGE-BASED SYSTEMS, 2016, 106 : 220 - 230
  • [40] Set-Word Embeddings and Semantic Indices: A New Contextual Model for Empirical Language Analysis
    de Cordoba, Pedro Fernandez
    Perez, Carlos A. Reyes
    Arnau, Claudia Sanchez
    Perez, Enrique A. Sanchez
    COMPUTERS, 2025, 14 (01)