XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-PhoneticWord Alignment

被引:0
|
作者
El-Kishky, Ahmed [1 ]
Renduchintala, Adithya [2 ]
Cross, James [2 ]
Guzman, Francisco [2 ]
Koehn, Philipp [3 ]
机构
[1] Twitter Cortex, San Francisco, CA 94103 USA
[2] Facebook AI, Menlo Pk, CA USA
[3] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification. While knowledge bases contain a large number of entities in high-resource languages such as English and French, corresponding entities for lower-resource languages are often missing. To address this, we propose Lexical-Semantic-Phonetic Align (LSP-Align), a technique to automatically mine cross-lingual entity lexica from mined web data. We demonstrate LSP-Align outperforms baselines at extracting cross-lingual entity pairs and mine 164 million entity pairs from 120 different languages aligned with English. We release these cross-lingual entity pairs along with the massively multilingual tagged named entity corpus as a resource to the NLP community.
引用
收藏
页码:10424 / 10430
页数:7
相关论文
共 50 条
  • [41] Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases
    Moritz, Maria
    Steding, David
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1976 - 1980
  • [42] Embedding-based Two-Stage Entity Alignment for Cross-Lingual Knowledge Graphs *
    Sun, Yuxiang
    Lee, Yongju
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (02) : 317 - 339
  • [43] Dual Gated Graph Attention Networks with Dynamic Iterative Training for Cross-Lingual Entity Alignment
    Xie, Zhiwen
    Zhu, Runjie
    Zhao, Kunsong
    Liu, Jin
    Zhou, Guangyou
    Huang, Jimmy Xiangji
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2022, 40 (03)
  • [44] OTIEA:Ontology-Enhanced Triple Intrinsic-Correlation for Cross-lingual Entity Alignment
    Zhang, Zhishuo
    Tan, Chengxiang
    Yang, Min
    Zhao, Xueyan
    NEURAL PROCESSING LETTERS, 2025, 57 (02)
  • [45] A cross-lingual medical knowledge graph entity alignment algorithm based on neural tensor network
    Liu, Jianyi
    Chai, Biao
    Shang, Zhijie
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2021, 128 : 31 - 32
  • [46] SEMANTICS DRIVEN MULTI-VIEW KNOWLEDGE GRAPH EMBEDDING FOR CROSS-LINGUAL ENTITY ALIGNMENT
    Zhang, Xin
    Liu, Yu
    Zhao, Zhehuan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 11811 - 11815
  • [47] Joint Multi-Feature Information Entity Alignment for Cross-Lingual Temporal Knowledge Graph With BERT
    Bai, Luyi
    Song, Xiuting
    Zhu, Lin
    IEEE TRANSACTIONS ON BIG DATA, 2025, 11 (02) : 345 - 358
  • [48] Enrich cross-lingual entity links for online wikis via multi-modal semantic matching
    Lu, Weiming
    Wang, Peng
    Ma, Xinyin
    Xu, Wei
    Chen, Chen
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (05)
  • [49] Multi-level multilingual semantic alignment for zero-shot cross-lingual transfer learning
    Gui, Anchun
    Xiao, Han
    NEURAL NETWORKS, 2024, 173
  • [50] The Nature of Cross-Lingual Lexical Semantic Relations: A Preliminary Study Based on English-Chinese Translation Equivalents
    Huang, Chu-Ren
    Lin, Wan-Ying
    Hong, Jia-Fei
    Su, I-Li
    GWC 2006: THIRD INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2005, : 181 - 190