Hybrid Approach Text Generation for Low-Resource Language

被引:0
|
作者
Rakhimova, Diana [1 ,2 ]
Adali, Esref [3 ]
Karibayeva, Aidana [1 ,2 ]
机构
[1] Al Farabi Kazakh Natl Univ, Alma Ata 050040, Kazakhstan
[2] Inst Informat & Computat Technol, Alma Ata 050010, Kazakhstan
[3] Istanbul Tech Univ, TR-34485 Istanbul, Turkiye
关键词
Text generation; low recourse language; Kazakh language; Turkish languages; TF-IDF; RNN; LSTM;
D O I
10.1007/978-3-031-70248-8_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text generation is an important tool used by many companies in various fields such as chatbots, search engines, and question and answer systems, and is a hot trend in artificial intelligence. Generating texts and sentences can be used for both educational and entertainment purposes. Generating texts and sentences for children in natural language processing plays an important role in children's development. This helps them improve their reading, comprehension and communication skills in the language. Currently, many languages of the world belong to the class with the low resources. The field of text generation for low-resource languages is still at an early stage of development and there are many problems that need to be solved. One of the main problems is the lack of big data and linguistic resources in the public domain, which makes it difficult to effectively apply modern machine learning methods. As well as the lack of modern methods and tools for analyzing the processing of these languages. This article presents a hybrid approach to text generation on the example of the Turkish and Kazakh languages. These languages belong to a large group of Turkic languages along with Kyrgyz, Tatar, Uzbek and other languages. An approach based on neural learning using the LSTM model is proposed and implemented, considering the structural and semantic properties of the language. Training and testing are carried out on the assembled corpus (for various types of text genres). The quality of text generation was assessed based on the BLEU metric.
引用
收藏
页码:256 / 268
页数:13
相关论文
共 50 条
  • [1] Weakly supervised scene text generation for low-resource languages
    Xie, Yangchen
    Chen, Xinyuan
    Zhan, Hongjian
    Shivakumara, Palaiahnakote
    Yin, Bing
    Liu, Cong
    Lu, Yue
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [2] An automated approach to identify sarcasm in low-resource language
    Khan, Shumaila
    Qasim, Iqbal
    Khan, Wahab
    Khan, Aurangzeb
    Khan, Javed Ali
    Qahmash, Ayman
    Ghadi, Yazeed Yasin
    PLOS ONE, 2024, 19 (12):
  • [3] Hierarchical Keyword Generation Method for Low-Resource Social Media Text
    Guan, Xinyi
    Long, Shun
    INFORMATION, 2023, 14 (11)
  • [4] Regularisation for Efficient Softmax Parameter Generation in Low-Resource Text Classifiers
    Griesshaber, Daniel
    Maucher, Johannes
    Vu, Ngoc Thang
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5058 - 5066
  • [5] Low-resource AMR-to-Text Generation: A Study on Brazilian Portuguese
    Sobrevilla Cabezudo, Marco Antonio
    Salgueiro, Thiago Alexandre
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2022, (68): : 85 - 97
  • [6] Hybrid Encoding Method for Scene Text Recognition in Low-Resource Uyghur
    Xu, Miaomiao
    Zhang, Jiang
    Xu, Lianghui
    Li, Yanbing
    Silamu, Wushour
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 86 - 99
  • [7] STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models
    Ma, Mingyu Derek
    Wang, Xiaoxuan
    Kung, Po-Nien
    Brantingham, P. Jeffrey
    Peng, Nanyun
    Wang, Wei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18751 - 18759
  • [8] Application of Quantum Recurrent Neural Network in Low-Resource Language Text Classification
    Yu, Wenbin
    Yin, Lei
    Zhang, Chengjun
    Chen, Yadang
    Liu, Alex X.
    IEEE TRANSACTIONS ON QUANTUM ENGINEERING, 2024, 5
  • [9] Faithful Low-Resource Data-to-Text Generation through Cycle Training
    Wang, Zhuoer
    Collins, Marcus
    Vedula, Nikhita
    Filice, Simone
    Malmasi, Shervin
    Rokhlenko, Oleg
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2847 - 2867
  • [10] A Study on Low-resource Language Identification
    Qi, Zhaodi
    Ma, Yong
    Gu, Mingliang
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1897 - 1902