XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-PhoneticWord Alignment

被引:0
|
作者
El-Kishky, Ahmed [1 ]
Renduchintala, Adithya [2 ]
Cross, James [2 ]
Guzman, Francisco [2 ]
Koehn, Philipp [3 ]
机构
[1] Twitter Cortex, San Francisco, CA 94103 USA
[2] Facebook AI, Menlo Pk, CA USA
[3] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification. While knowledge bases contain a large number of entities in high-resource languages such as English and French, corresponding entities for lower-resource languages are often missing. To address this, we propose Lexical-Semantic-Phonetic Align (LSP-Align), a technique to automatically mine cross-lingual entity lexica from mined web data. We demonstrate LSP-Align outperforms baselines at extracting cross-lingual entity pairs and mine 164 million entity pairs from 120 different languages aligned with English. We release these cross-lingual entity pairs along with the massively multilingual tagged named entity corpus as a resource to the NLP community.
引用
收藏
页码:10424 / 10430
页数:7
相关论文
共 50 条
  • [1] Cross-lingual Entity Alignment with Incidental Supervision
    Chen, Muhao
    Shi, Weijia
    Zhou, Ben
    Roth, Dan
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 645 - 658
  • [2] A Framework for Cross-lingual/Node-wise Alignment of Lexical-Semantic Resources
    Hayashi, Yoshihiko
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2607 - 2613
  • [3] Adaptive Entity Alignment for Cross-Lingual Knowledge Graph
    Zhang, Yuanming
    Gao, Tianyu
    Lu, Jiawei
    Cheng, Zhenbo
    Xiao, Gang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 474 - 487
  • [4] WASSERSTEIN CROSS-LINGUAL ALIGNMENT FOR NAMED ENTITY RECOGNITION
    Wang, Rui
    Henao, Ricardo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8342 - 8346
  • [5] Iterative Cross-Lingual Entity Alignment Based on TransC
    Kang, Shize
    Ji, Lixin
    Li, Zhenglian
    Hao, Xindi
    Ding, Yuehang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (05) : 1002 - 1005
  • [6] Cross-lingual entity matching and infobox alignment in Wikipedia
    Rinser, Daniel
    Lange, Dustin
    Naumann, Felix
    INFORMATION SYSTEMS, 2013, 38 (06) : 887 - 907
  • [7] Cross-lingual Semantic Specialization via Lexical Relation Induction
    Ponti, Edoardo M.
    Vulic, Ivan
    Glavas, Goran
    Reichart, Roi
    Korhonen, Anna
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2206 - 2217
  • [8] Improving Cross-lingual Entity Alignment via Optimal Transport
    Pei, Shichao
    Yu, Lu
    Zhang, Xiangliang
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3231 - 3237
  • [9] Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing
    Sherborne, Tom
    Hosking, Tom
    Lapata, Mirella
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1432 - 1450
  • [10] Alignment-free Cross-lingual Semantic Role Labeling
    Cai, Rui
    Lapata, Mirella
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3883 - 3894