Interpretable Entity Representations through Large-Scale Typing

被引:0
|
作者
Onoe, Yasumasa [1 ]
Durrett, Greg [1 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In standard methodology for natural language processing, entities in text are typically embedded in dense vector spaces with pre-trained models. The embeddings produced this way are effective when fed into downstream models, but they require end-task fine-tuning and are fundamentally difficult to interpret. In this paper, we present an approach to creating entity representations that are human readable and achieve high performance on entity-related tasks out of the box. Our representations are vectors whose values correspond to posterior probabilities over finegrained entity types, indicating the confidence of a typing model's decision that the entity belongs to the corresponding type. We obtain these representations using a fine-grained entity typing model, trained either on supervised ultra-fine entity typing data (Choi et al., 2018) or distantly-supervised examples from Wikipedia. On entity probing tasks involving recognizing entity identity, our embeddings used in parameter-free downstream models achieve competitive performance with ELMoand BERT-based embeddings in trained models. We also show that it is possible to reduce the size of our type set in a learning-based way for particular domains. Finally, we show that these embeddings can be post-hoc modified through a small number of rules to incorporate domain knowledge and improve performance.
引用
收藏
页码:612 / 624
页数:13
相关论文
共 50 条
  • [41] Neural Word Representations from Large-Scale Commonsense Knowledge
    Chen, Jiaqiang
    Tandon, Niket
    de Melo, Gerard
    2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 1, 2015, : 225 - 228
  • [42] A RAPD-PCR method for large-scale typing of Bacillus cereus
    Nilsson, J
    Svensson, B
    Ekelund, K
    Christiansson, A
    LETTERS IN APPLIED MICROBIOLOGY, 1998, 27 (03) : 168 - 172
  • [43] Interpretable tool wear monitoring: Architecture with large-scale CNN and adaptive EMD
    Sun, Yi
    Song, Hongliang
    Gao, Hongli
    Li, Jie
    Yin, Shuang
    JOURNAL OF MANUFACTURING SYSTEMS, 2025, 78 : 294 - 307
  • [44] Blocking for Large-Scale Entity Resolution: Challenges, Algorithms, and Practical Examples
    Papadakis, George
    Palpanas, Themis
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1436 - 1439
  • [45] Medical Entity Relation Verification with Large-scale Machine Reading Comprehension
    Xia, Yuan
    Wang, Chunyu
    Shi, Zhenhui
    Zhou, Jingbo
    Lu, Chao
    Huang, Haifeng
    Xiong, Hui
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3765 - 3774
  • [46] Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages
    Mhaske, Arnav
    Kedia, Harshit
    Doddapaneni, Sumanth
    Khapra, Mitesh M.
    Kumar, Pratyush
    Murthy, V. Rudra
    Kunchukuttan, Anoop
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10441 - 10456
  • [47] DNRTI: A Large-scale Dataset for Named Entity Recognition in Threat Intelligence
    Wang, Xuren
    Liu, Xinpei
    Ao, Shengqin
    Li, Ning
    Jiang, Zhengwei
    Xu, Zongyi
    Xiong, Zihan
    Xiong, Mengbo
    Zhang, Xiaoqing
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1842 - 1848
  • [48] Empirical Study on Entity Interaction Graph of Large-scale Parallel Simulations
    Hou, Bonan
    Yao, Yiping
    Peng, Shaoliang
    2011 IEEE WORKSHOP ON PRINCIPLES OF ADVANCED AND DISTRIBUTED SIMULATION (PADS), 2011,
  • [49] TransformingWikipedia into a Large-Scale Fine-Grained Entity Type Corpus
    Ghaddar, Abbas
    Langlais, Philippe
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4413 - 4420
  • [50] Landmarks-based Blocking Method For Large-scale Entity Resolution
    Herath, Samudra
    Roughan, Matthew
    Glonek, Gary
    2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 773 - 774