Hybrid Approach Text Generation for Low-Resource Language

被引:0
|
作者
Rakhimova, Diana [1 ,2 ]
Adali, Esref [3 ]
Karibayeva, Aidana [1 ,2 ]
机构
[1] Al Farabi Kazakh Natl Univ, Alma Ata 050040, Kazakhstan
[2] Inst Informat & Computat Technol, Alma Ata 050010, Kazakhstan
[3] Istanbul Tech Univ, TR-34485 Istanbul, Turkiye
关键词
Text generation; low recourse language; Kazakh language; Turkish languages; TF-IDF; RNN; LSTM;
D O I
10.1007/978-3-031-70248-8_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text generation is an important tool used by many companies in various fields such as chatbots, search engines, and question and answer systems, and is a hot trend in artificial intelligence. Generating texts and sentences can be used for both educational and entertainment purposes. Generating texts and sentences for children in natural language processing plays an important role in children's development. This helps them improve their reading, comprehension and communication skills in the language. Currently, many languages of the world belong to the class with the low resources. The field of text generation for low-resource languages is still at an early stage of development and there are many problems that need to be solved. One of the main problems is the lack of big data and linguistic resources in the public domain, which makes it difficult to effectively apply modern machine learning methods. As well as the lack of modern methods and tools for analyzing the processing of these languages. This article presents a hybrid approach to text generation on the example of the Turkish and Kazakh languages. These languages belong to a large group of Turkic languages along with Kyrgyz, Tatar, Uzbek and other languages. An approach based on neural learning using the LSTM model is proposed and implemented, considering the structural and semantic properties of the language. Training and testing are carried out on the assembled corpus (for various types of text genres). The quality of text generation was assessed based on the BLEU metric.
引用
收藏
页码:256 / 268
页数:13
相关论文
共 50 条
  • [31] Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
    Lux, Florian
    Vu, Ngoc Thang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6858 - 6868
  • [32] Multilingual Offensive Language Identification for Low-resource Languages
    Ranasinghe, Tharindu
    Zampieri, Marcos
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [33] A Scheme for News Article Classification in a Low-Resource Language
    Yohannes, Hailemariam Mehari
    Amagasa, Toshiyuki
    INFORMATION INTEGRATION AND WEB INTELLIGENCE, IIWAS 2022, 2022, 13635 : 519 - 530
  • [34] Low-resource Taxonomy Enrichment with Pretrained Language Models
    Takeoka, Kunihiro
    Akimoto, Kosuke
    Oyamada, Masafumi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2747 - 2758
  • [35] Natural language processing applications for low-resource languages
    Pakray, Partha
    Gelbukh, Alexander
    Bandyopadhyay, Sivaji
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 183 - 197
  • [36] Learning Bilingual Lexicon for Low-Resource Language Pairs
    Zhu, ShaoLin
    Li, Xiao
    Yang, YaTing
    Wang, Lei
    Mi, ChengGang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 760 - 770
  • [37] NLPashto: NLP Toolkit for Low-resource Pashto Language
    Haq, Ijazul
    Qiu, Weidong
    Guo, Jie
    Tang, Peng
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 1344 - 1352
  • [38] Automatic Labeling of Clusters for a Low-Resource Urdu Language
    Nasim, Zarmeen
    Haider, Sajjad
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [39] Building a Dataset for Misinformation Detection in the Low-Resource Language
    Mukwevho, Mulweli
    Rananga, Seani
    Mbooi, Mahlatse S.
    Isong, Bassey
    Marivate, Vukosi
    2024 IST-AFRICA CONFERENCE, 2024,
  • [40] On the study of very low-resource language keyword search
    Van Tung Pham
    Xu, Haihua
    Van Hai Do
    Chong, Tze Yuang
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 358 - 364