Learning bilingual word embedding for automatic text summarization in low resource language

被引:4
|
作者
Wijayanti, Rini [1 ,3 ]
Khodra, Masayu Leylia [1 ,2 ]
Surendro, Kridanto [1 ]
Widyantoro, Dwi H. [1 ,2 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung, Indonesia
[2] Univ Ctr Excellence Artificial Intelligence Vis, Inst Teknol Bandung, Nat Language Proc & Big Data Analyt U CoE AI VLB, Bandung, Indonesia
[3] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung 40132, Indonesia
关键词
Bilingual word embedding; Cross -lingual transfer learning; Extractive summarization; Low -resource language;
D O I
10.1016/j.jksuci.2023.03.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Studies in low-resource languages have become more challenging with the increasing volume of texts in today ' s digital era. Also, the lack of labeled data and text processing libraries in a language further widens the research gap between high and low-resource languages, such as English and Indonesian. This has led to the use of a transfer learning approach, which applies pre-trained models to solve similar problems, even in different languages by using bilingual or cross-lingual word embedding. Therefore, this study aims to investigate two bilingual word embedding methods, namely VecMap and BiVec, for Indonesian - English language and evaluates them for bilingual lexicon induction and text summarization tasks. The generated bilingual embedding was compared with MUSE (Multilingual Unsupervised and Supervised Embeddings) as the existing multilingual word created with the generative adversarial network method. Furthermore, the VecMap was improved by creating shared vocabulary spaces and mapping the unshared ones between languages. The result showed the embedding produced by the joint methods of BiVec, performed better in intrinsic evaluation, especially with CSLS (Cross-Domain Similarity Local Scaling) retrieval. Meanwhile, the improved VecMap outperformed the regular type by 16.6% without surpassing the BiVec evaluation score. These methods enabled model transfer between languages when applied to cross-lingual-based text summarization. Moreover, the ROUGE score outperformed classical text summarization by adding only 10% of the training dataset of the target language. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access
引用
收藏
页码:224 / 235
页数:12
相关论文
共 50 条
  • [31] Machine Learning-Based Automatic Text Summarization Techniques
    Radhakrishnan P.
    Senthil kumar G.
    SN Computer Science, 4 (6)
  • [32] The Algorithm of Automatic Text Summarization Based on Network Representation Learning
    Song, Xinghao
    Yang, Chunming
    Zhang, Hui
    Zhao, Xujian
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 362 - 371
  • [33] Automatic Text Summarization Using Deep Reinforcement Learning and Beyond
    Sun, Gang
    Wang, Zhongxin
    Zhao, Jia
    INFORMATION TECHNOLOGY AND CONTROL, 2021, 50 (03): : 458 - 469
  • [34] Automatic Image Annotation using Word Embedding Learning
    Chen, Qi
    Yip, Andy M.
    Tan, Chew Lim
    2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, 2012, : 269 - 276
  • [35] Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya
    Fesseha, Awet
    Xiong, Shengwu
    Emiru, Eshete Derb
    Diallo, Moussa
    Dahou, Abdelghani
    INFORMATION, 2021, 12 (02) : 1 - 17
  • [36] Word embedding and text classification based on deep learning methods
    Li, Saihan
    Gong, Bing
    2020 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE COMMUNICATION AND NETWORK SECURITY (CSCNS2020), 2021, 336
  • [37] Effects of language mixing on bilingual children's word learning
    Byers-Heinlein, Krista
    Jardak, Amel
    Fourakis, Eva
    Lew-Williams, Casey
    BILINGUALISM-LANGUAGE AND COGNITION, 2022, 25 (01) : 55 - 69
  • [38] Automatic Text Summarization of Konkani Folk Tales Using Supervised Machine Learning Algorithms and Language Independent Features
    D'Silva, Jovi
    Sharma, Uzzal
    IETE JOURNAL OF RESEARCH, 2023, 69 (09) : 6162 - 6175
  • [39] Word Learning in Bilingual Children at Risk for Developmental Language Disorder
    Kan, Pui Fong
    AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY, 2024, 33 (06)
  • [40] Automatic Extraction of Key Sentences via Word Sense Identification for Chinese Text Summarization
    Kuo, Yau-Hwang
    Huang, Hsun-Hui
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2007, 11 (04) : 416 - 422