Hybrid Approach Text Generation for Low-Resource Language

被引:0
|
作者
Rakhimova, Diana [1 ,2 ]
Adali, Esref [3 ]
Karibayeva, Aidana [1 ,2 ]
机构
[1] Al Farabi Kazakh Natl Univ, Alma Ata 050040, Kazakhstan
[2] Inst Informat & Computat Technol, Alma Ata 050010, Kazakhstan
[3] Istanbul Tech Univ, TR-34485 Istanbul, Turkiye
关键词
Text generation; low recourse language; Kazakh language; Turkish languages; TF-IDF; RNN; LSTM;
D O I
10.1007/978-3-031-70248-8_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text generation is an important tool used by many companies in various fields such as chatbots, search engines, and question and answer systems, and is a hot trend in artificial intelligence. Generating texts and sentences can be used for both educational and entertainment purposes. Generating texts and sentences for children in natural language processing plays an important role in children's development. This helps them improve their reading, comprehension and communication skills in the language. Currently, many languages of the world belong to the class with the low resources. The field of text generation for low-resource languages is still at an early stage of development and there are many problems that need to be solved. One of the main problems is the lack of big data and linguistic resources in the public domain, which makes it difficult to effectively apply modern machine learning methods. As well as the lack of modern methods and tools for analyzing the processing of these languages. This article presents a hybrid approach to text generation on the example of the Turkish and Kazakh languages. These languages belong to a large group of Turkic languages along with Kyrgyz, Tatar, Uzbek and other languages. An approach based on neural learning using the LSTM model is proposed and implemented, considering the structural and semantic properties of the language. Training and testing are carried out on the assembled corpus (for various types of text genres). The quality of text generation was assessed based on the BLEU metric.
引用
收藏
页码:256 / 268
页数:13
相关论文
共 50 条
  • [21] Character Profiling in Low-Resource Language Documents
    Wong, Tak-sum
    Lee, John
    ADCS 2019: PROCEEDINGS OF THE 24TH AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM, 2019,
  • [22] ADAPTING PRE-TRAINED LANGUAGE MODELS TO LOW-RESOURCE TEXT SIMPLIFICATION: THE PATH MATTERS
    Garbacea, Cristina
    Mei, Qiaozhu
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
  • [23] Evaluation of the morphological rules for the Tenyidie language: a low-resource language
    Angami, Teisovi
    Kevichusa-Ezung, Mimi
    Singh, Sanasam Ranbir
    Tuithung, Themrichon
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [24] Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation
    Zhao, Yingxiu
    Tian, Zhiliang
    Yao, Huaxiu
    Zheng, Yinhe
    Lee, Dongkyu
    Song, Yiping
    Sun, Jian
    Zhang, Nevin L.
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 583 - 595
  • [25] XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages
    Abhishek, Tushar
    Sagare, Shivprasad
    Singh, Bhavyajeet
    Sharma, Anubhav
    Gupta, Manish
    Varma, Vasudeva
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 171 - 175
  • [26] Cognate Projection for Low-Resource Inflection Generation
    Hauer, Bradley
    Habibi, Amir A.
    Luan, Yixing
    Riyadh, Rashed Rubby
    Kondrak, Grzegorz
    16TH SIGMORPHON WORKSHOP ON COMPUTATIONAL RESEARCH IN PHONETICS PHONOLOGY, AND MORPHOLOGY (SIGMORPHON 2019), 2019, : 6 - 11
  • [27] Low-Resource Response Generation with Template Prior
    Yang, Ze
    Wu, Wei
    Yang, Jian
    Xu, Can
    Li, Zhoujun
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1886 - 1897
  • [28] Latent Reasoning for Low-Resource Question Generation
    Huang, Xinting
    Qi, Jianzhong
    Sun, Yu
    Zhang, Rui
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3008 - 3022
  • [29] Data Augmentation for Low-Resource Keyphrase Generation
    Garg, Krishna
    Chowdhury, Jishnu Ray
    Caragea, Cornelia
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8442 - 8455
  • [30] Text-to-Speech for Low-Resource Agglutinative Language With Morphology-Aware Language Model Pre-Training
    Liu, Rui
    Hu, Yifan
    Zuo, Haolin
    Luo, Zhaojie
    Wang, Longbiao
    Gao, Guanglai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1075 - 1087