Hybrid Approach Text Generation for Low-Resource Language

被引:0
|
作者
Rakhimova, Diana [1 ,2 ]
Adali, Esref [3 ]
Karibayeva, Aidana [1 ,2 ]
机构
[1] Al Farabi Kazakh Natl Univ, Alma Ata 050040, Kazakhstan
[2] Inst Informat & Computat Technol, Alma Ata 050010, Kazakhstan
[3] Istanbul Tech Univ, TR-34485 Istanbul, Turkiye
关键词
Text generation; low recourse language; Kazakh language; Turkish languages; TF-IDF; RNN; LSTM;
D O I
10.1007/978-3-031-70248-8_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text generation is an important tool used by many companies in various fields such as chatbots, search engines, and question and answer systems, and is a hot trend in artificial intelligence. Generating texts and sentences can be used for both educational and entertainment purposes. Generating texts and sentences for children in natural language processing plays an important role in children's development. This helps them improve their reading, comprehension and communication skills in the language. Currently, many languages of the world belong to the class with the low resources. The field of text generation for low-resource languages is still at an early stage of development and there are many problems that need to be solved. One of the main problems is the lack of big data and linguistic resources in the public domain, which makes it difficult to effectively apply modern machine learning methods. As well as the lack of modern methods and tools for analyzing the processing of these languages. This article presents a hybrid approach to text generation on the example of the Turkish and Kazakh languages. These languages belong to a large group of Turkic languages along with Kyrgyz, Tatar, Uzbek and other languages. An approach based on neural learning using the LSTM model is proposed and implemented, considering the structural and semantic properties of the language. Training and testing are carried out on the assembled corpus (for various types of text genres). The quality of text generation was assessed based on the BLEU metric.
引用
收藏
页码:256 / 268
页数:13
相关论文
共 50 条
  • [11] Low-Resource Speech-to-Text Translation
    Bansal, Sameer
    Kamper, Herman
    Livescu, Karen
    Lopez, Adam
    Goldwater, Sharon
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1298 - 1302
  • [12] Text-to-speech for low-resource systems
    Schnell, M
    Küstner, M
    Jokisch, O
    Hoffmann, R
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2002, : 259 - 262
  • [13] Variational model for low-resource natural language generation in spoken dialogue systems
    Tran, Van-Khanh
    Nguyen, Le-Minh
    Computer Speech and Language, 2021, 65
  • [14] Variational model for low-resource natural language generation in spoken dialogue systems
    Van-Khanh Tran
    Le-Minh Nguyen
    COMPUTER SPEECH AND LANGUAGE, 2021, 65
  • [15] DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks
    Ding, Bosheng
    Liu, Linlin
    Bing, Lidong
    Kruengkrai, Canasai
    Nguyen, Thien Hai
    Joty, Shafiq
    Si, Luo
    Miao, Chunyan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6045 - 6057
  • [16] Text data augmentation and pre-trained Language Model for enhancing text classification of low-resource languages
    Ziyaden, Atabay
    Yelenov, Amir
    Hajiyev, Fuad
    Rustamov, Samir
    Pak, Alexandr
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [17] GlotLID: Language Identification for Low-Resource Languages
    Kargaran, Amir Hossein
    Imani, Ayyoob
    Yvon, Francois
    Schuetze, Hinrich
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6155 - 6218
  • [18] Machine Translation into Low-resource Language Varieties
    Kumar, Sachin
    Anastasopoulos, Antonios
    Wintner, Shuly
    Tsvetkov, Yulia
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 110 - 121
  • [19] Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis
    Saeki, Takaaki
    Maiti, Soumi
    Li, Xinjian
    Watanabe, Shinji
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1829 - 1844
  • [20] A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families
    Nasution, Arbi Haza
    Murakami, Yohei
    Ishida, Toru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2018, 17 (02)