Multilingual Controllable Transformer-Based Lexical Simplification

被引:0
|
作者
Sheang, Kim Cheng [1 ]
Saggion, Horacio [1 ]
机构
[1] Univ Pompeu Fabra, LaSTUS Grp, TALN Lab, DTIC, Barcelona, Spain
来源
PROCESAMIENTO DEL LENGUAJE NATURAL | 2023年 / 71期
关键词
Multilingual Lexical Simplification; Controllable Lexical Simplification; Text Simplification; Multilinguality;
D O I
10.26342/2023-71-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text is by far the most ubiquitous source of knowledge and information and should be made easily accessible to as many people as possible; however, texts often contain complex words that hinder reading comprehension and accessibility. Therefore, suggesting simpler alternatives for complex words without compromising meaning would help convey the information to a broader audience. This paper proposes mTLS, a multilingual controllable Transformer-based Lexical Simplification (LS) system fined-tuned with the T5 model. The novelty of this work lies in the use of language-specific prefixes, control tokens, and candidates extracted from pretrained masked language models to learn simpler alternatives for complex words. The evaluation results on three well-known LS datasets - LexMTurk, BenchLS, and NNSEval - show that our model outperforms the previous state-of-the-art models like LSBert and ConLS. Moreover, further evaluation of our approach on the part of the recent TSAR-2022 multilingual LS shared-task dataset shows that our model performs competitively when compared with the participating systems for English LS and even outperforms the GPT-3 model on several metrics. Moreover, our model obtains performance gains also for Spanish and Portuguese.
引用
收藏
页码:109 / 123
页数:15
相关论文
共 50 条
  • [11] A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages
    De, Arkadipta
    Bandyopadhyay, Dibyanayan
    Gain, Baban
    Ekbal, Asif
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [12] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
    Al-onazi, Badriyya B.
    Nauman, Muhammad Asif
    Jahangir, Rashid
    Malik, Muhmmad Mohsin
    Alkhammash, Eman H.
    Elshewey, Ahmed M.
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [13] Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing
    Minh Van Nguyen
    Viet Lai
    Ben Veyseh, Amir Pouran
    Thien Huu Nguyen
    EACL 2021: THE 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, 2021, : 80 - 90
  • [14] Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese
    de Lima Santos, Diego Bernardes
    de Carvalho Dutra, Frederico Giffoni
    Parreiras, Fernando Silva
    Brandao, Wladmir Cardoso
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 473 - 483
  • [15] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing
    Xu, Yanbo
    Yin, Yueqin
    Jiang, Liming
    Wu, Qianyi
    Zheng, Chengyao
    Loy, Chen Change
    Dai, Bo
    Wu, Wayne
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7673 - 7682
  • [16] Transformer-Based Multilingual G2P Converter for E-Learning System
    Liu, Jueting
    Ren, Chang
    Luan, Yaoxuan
    Li, Sicheng
    Xie, Tianshi
    Seals, Cheryl
    Atkins, Marisha Speights
    ARTIFICIAL INTELLIGENCE IN HCI, AI-HCI 2022, 2022, 13336 : 546 - 556
  • [17] LSBert: Lexical Simplification Based on BERT
    Qiang, Jipeng
    Li, Yun
    Zhu, Yi
    Yuan, Yunhao
    Shi, Yang
    Wu, Xindong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3064 - 3076
  • [18] Transformer-Based Learned Optimization
    Gartner, Erik
    Metz, Luke
    Andriluka, Mykhaylo
    Freeman, C. Daniel
    Sminchisescu, Cristian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11970 - 11979
  • [19] Transformer-based Image Compression
    Lu, Ming
    Guo, Peiyao
    Shi, Huiqing
    Cao, Chuntong
    Ma, Zhan
    DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 469 - 469
  • [20] Transformer-based Cross-Lingual Summarization using Multilingual Word Embeddings for English - Bahasa Indonesia
    Abka, Achmad F.
    Azizah, Kurniawati
    Jatmiko, Wisnu
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 636 - 645