Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision

被引:4
|
作者
Aysa, Anwar [1 ]
Ablimit, Mijit [1 ]
Yilahun, Hankiz [1 ]
Hamdulla, Askar [1 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
关键词
bilingual dictionary; seed dictionary; cross-language word embedding;
D O I
10.3390/info13040175
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bilingual lexicon extraction is useful, especially for low-resource languages that can leverage from high-resource languages. The Uyghur language is a derivative language, and its language resources are scarce and noisy. Moreover, it is difficult to find a bilingual resource to utilize the linguistic knowledge of other large resource languages, such as Chinese or English. There is little related research on unsupervised extraction for the Chinese-Uyghur languages, and the existing methods mainly focus on term extraction methods based on translated parallel corpora. Accordingly, unsupervised knowledge extraction methods are effective, especially for the low-resource languages. This paper proposes a method to extract a Chinese-Uyghur bilingual dictionary by combining the inter-word relationship matrix mapped by the neural network cross-language word embedding vector. A seed dictionary is used as a weak supervision signal. A small Chinese-Uyghur parallel data resource is used to map the multilingual word vectors into a unified vector space. As the word-particles of these two languages are not well-coordinated, stems are used as the main linguistic particles. The strong inter-word semantic relationship of word vectors is used to associate Chinese-Uyghur semantic information. Two retrieval indicators, such as nearest neighbor retrieval and cross-domain similarity local scaling, are used to calculate similarity to extract bilingual dictionaries. The experimental results show that the accuracy of the Chinese-Uyghur bilingual dictionary extraction method proposed in this paper is improved to 65.06%. This method helps to improve Chinese-Uyghur machine translation, automatic knowledge extraction, and multilingual translations.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Sub-word based unsupervised bilingual dictionary induction for Chinese-Uyghur
    Aysa, Anwar
    Ablimit, Mijit
    Yilahun, Hankiz
    Hamdulla, Askar
    2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 476 - 481
  • [2] On the Chinese-Uyghur inscription of 1361
    Franke, H
    ZEITSCHRIFT DER DEUTSCHEN MORGENLANDISCHEN GESELLSCHAFT, 2003, 153 (01): : 143 - 156
  • [3] Exploration of Chinese-Uyghur Neural Machine Translation
    Mahmut, Gulnigar
    Memet, Rehmutulla
    Nijat, Mewlude
    Hamdulla, Askar
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 176 - 179
  • [4] Implementation of Chinese-Uyghur Bilateral EBMT System
    Abiderexiti, Kahaerjiang
    Yao, Tianfang
    Yibulayin, Tuergen
    Wumaier, Aishan
    Yiming, Yasen
    2013 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2013), 2013, : 87 - 90
  • [5] A rule-based approach for Chinese-Uyghur NE machine transliteration
    Maimaitimin, Saimaiti
    Imin, Yasin
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 346 - 349
  • [6] A Neural-Network-Based Approach to Chinese-Uyghur Organization Name Translation
    Wumaier, Aishan
    Xu, Cuiyun
    Kadeer, Zaokere
    Liu, Wenqi
    Wang, Yingbo
    Haierla, Xireaili
    Maimaiti, Maihemuti
    Tian, ShengWei
    Saimaiti, Alimu
    INFORMATION, 2020, 11 (10) : 1 - 18
  • [7] Memory-augmented Chinese-Uyghur Neural Machine Translation
    Zhang, Shiyue
    Mahmut, Gulnigar
    Wang, Dong
    Hamdulla, Askar
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1092 - 1096
  • [8] Markedness of phonological elements and tone match in Chinese-Uyghur contact
    Du, Zhaojin
    Chen, Baoya
    LANGUAGE AND LINGUISTICS, 2017, 18 (03) : 383 - 429
  • [9] Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction
    Kim, Jae-Hoon
    Kwon, Hong-Seok
    Seo, Hyeong-Won
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2015, 2015