Hybrid Approach Text Generation for Low-Resource Language

被引:0
|
作者
Rakhimova, Diana [1 ,2 ]
Adali, Esref [3 ]
Karibayeva, Aidana [1 ,2 ]
机构
[1] Al Farabi Kazakh Natl Univ, Alma Ata 050040, Kazakhstan
[2] Inst Informat & Computat Technol, Alma Ata 050010, Kazakhstan
[3] Istanbul Tech Univ, TR-34485 Istanbul, Turkiye
关键词
Text generation; low recourse language; Kazakh language; Turkish languages; TF-IDF; RNN; LSTM;
D O I
10.1007/978-3-031-70248-8_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text generation is an important tool used by many companies in various fields such as chatbots, search engines, and question and answer systems, and is a hot trend in artificial intelligence. Generating texts and sentences can be used for both educational and entertainment purposes. Generating texts and sentences for children in natural language processing plays an important role in children's development. This helps them improve their reading, comprehension and communication skills in the language. Currently, many languages of the world belong to the class with the low resources. The field of text generation for low-resource languages is still at an early stage of development and there are many problems that need to be solved. One of the main problems is the lack of big data and linguistic resources in the public domain, which makes it difficult to effectively apply modern machine learning methods. As well as the lack of modern methods and tools for analyzing the processing of these languages. This article presents a hybrid approach to text generation on the example of the Turkish and Kazakh languages. These languages belong to a large group of Turkic languages along with Kyrgyz, Tatar, Uzbek and other languages. An approach based on neural learning using the LSTM model is proposed and implemented, considering the structural and semantic properties of the language. Training and testing are carried out on the assembled corpus (for various types of text genres). The quality of text generation was assessed based on the BLEU metric.
引用
收藏
页码:256 / 268
页数:13
相关论文
共 50 条
  • [41] DISTRIBUTION AUGMENTATION FOR LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH
    Lajszczak, Mateusz
    Prasad, Animesh
    van Korlaar, Arent
    Bollepalli, Bajibabu
    Bonafonte, Antonio
    Joly, Arnaud
    Nicolis, Marco
    Moinet, Alexis
    Drugman, Thomas
    Wood, Trevor
    Sokolova, Elena
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8307 - 8311
  • [42] Bidirectional Representations for Low-Resource Spoken Language Understanding
    Meeus, Quentin
    Moens, Marie-Francine
    Van Hamme, Hugo
    APPLIED SCIENCES-BASEL, 2023, 13 (20):
  • [43] Text Augmentation Using Dataset Reconstruction for Low-Resource Classification
    Rahamim, Adir
    Uziel, Guy
    Goldbraich, Esther
    Anaby-Tavor, Ateret
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7389 - 7402
  • [44] Prompt-based for Low-Resource Tibetan Text Classification
    An, Bo
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (08)
  • [45] ParaSum: Contrastive Paraphrasing for Low-Resource Extractive Text Summarization
    Tang, Moming
    Wang, Chengyu
    Wang, Jianing
    Chen, Cen
    Gao, Ming
    Qian, Weining
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, KSEM 2023, 2023, 14119 : 106 - 119
  • [46] Hybrid Cardiac Rehabilitation Program in a Low-Resource Setting
    Seron, Pamela
    Oliveros, Maria Jose
    Marzuca-Nassr, Gabriel Nasri
    Morales, Gladys
    Roman, Claudia
    Munoz, Sergio Raul
    Galvez, Manuel
    Latin, Gonzalo
    Marileo, Tania
    Molina, Juan Pablo
    Navarro, Rocio
    Sepulveda, Pablo
    Lanas, Fernando
    Saavedra, Nicolas
    Ulloa, Constanza
    Grace, Sherry L.
    JAMA NETWORK OPEN, 2024, 7 (01)
  • [47] Efficient Entity Candidate Generation for Low-Resource Languages
    Garcia-Duran, Alberto
    Arora, Akhil
    West, Robert
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6429 - 6438
  • [48] Harnessing Knowledge Distillation for Enhanced Text-to-Text Translation in Low-Resource Languages
    Ahmed, Manar Ouled
    Ming, Zuheng
    Othmani, Alice
    SPEECH AND COMPUTER, SPECOM 2024, PT II, 2025, 15300 : 295 - 307
  • [49] Bridging the Gap: Towards Linguistic Resource Development for the Low-Resource Lambani Language
    Dasare, Ashwini
    Chowdhury, Amartya Roy
    Menon, Aditya Srinivas
    Anand, Konjengbam
    Deepak, K. T.
    Prasanna, S. R. M.
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 127 - 139
  • [50] A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages
    Zennaki, O.
    Semmar, N.
    Besacier, L.
    NATURAL LANGUAGE ENGINEERING, 2019, 25 (01) : 43 - 67