XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-PhoneticWord Alignment

被引:0
|
作者
El-Kishky, Ahmed [1 ]
Renduchintala, Adithya [2 ]
Cross, James [2 ]
Guzman, Francisco [2 ]
Koehn, Philipp [3 ]
机构
[1] Twitter Cortex, San Francisco, CA 94103 USA
[2] Facebook AI, Menlo Pk, CA USA
[3] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification. While knowledge bases contain a large number of entities in high-resource languages such as English and French, corresponding entities for lower-resource languages are often missing. To address this, we propose Lexical-Semantic-Phonetic Align (LSP-Align), a technique to automatically mine cross-lingual entity lexica from mined web data. We demonstrate LSP-Align outperforms baselines at extracting cross-lingual entity pairs and mine 164 million entity pairs from 120 different languages aligned with English. We release these cross-lingual entity pairs along with the massively multilingual tagged named entity corpus as a resource to the NLP community.
引用
收藏
页码:10424 / 10430
页数:7
相关论文
共 50 条
  • [21] MHGCN:Multiview Highway Graph Convolutional Network for Cross-Lingual Entity Alignment
    Jianliang Gao
    Xiangyue Liu
    Yibo Chen
    Fan Xiong
    TsinghuaScienceandTechnology, 2022, 27 (04) : 719 - 728
  • [22] MHGCN: Multiview highway graph convolutional network for cross-lingual entity alignment
    Gao, Jianliang
    Liu, Xiangyue
    Chen, Yibo
    Xiong, Fan
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (04) : 719 - 728
  • [23] Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding
    Sun, Zequn
    Hu, Wei
    Li, Chengkai
    SEMANTIC WEB - ISWC 2017, PT I, 2017, 10587 : 628 - 644
  • [24] Modeling Multi-mapping Relations for Precise Cross-lingual Entity Alignment
    Shi, Xiaofei
    Xiao, Yanghua
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 813 - 822
  • [25] Bi-Neighborhood Graph Neural Network for cross-lingual entity alignment
    Shi, Xinchen
    Li, Bin
    Chen, Ling
    Yang, Chao
    KNOWLEDGE-BASED SYSTEMS, 2023, 277
  • [26] Diverse Structure-Aware Relation Representation in Cross-Lingual Entity Alignment
    Zhang, Yuhong
    wu, Jianqing
    Yu, Kui
    Wu, Xindong
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (04)
  • [27] Embedding-Based Entity Alignment of Cross-Lingual Temporal Knowledge Graphs
    Bai, Luyi
    Li, Nan
    Li, Guishun
    Zhang, Ziyi
    Zhu, Lin
    NEURAL NETWORKS, 2024, 172
  • [28] MRAEA: An Efficient and Robust Entity Alignment Approach for Cross-lingual Knowledge Graph
    Mao, Xin
    Wang, Wenting
    Xu, Huimin
    Lan, Man
    Wu, Yuanbin
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 420 - 428
  • [29] X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset
    Dazat, Angel
    Frank, Anette
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3904 - 3914
  • [30] MCLS: A Large-Scale Multimodal Cross-Lingual Summarization Dataset
    Shi, Xiaorui
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 273 - 288