Towards unifying pre-trained language models for semantic text exchange

被引:1
|
作者
Miao, Jingyuan [1 ]
Zhang, Yuqi [2 ]
Jiang, Nan [1 ]
Wen, Jie [3 ]
Pei, Kanglu [4 ]
Wan, Yue [1 ]
Wan, Tao [1 ]
Chen, Honglong [5 ]
机构
[1] East China Jiaotong Univ, Coll Informat Engn, Nanchang 330013, Peoples R China
[2] Renmin Univ China, Sch Infomat, Beijing 100872, Peoples R China
[3] Cent South Univ, Coll Elect & Automation Engn, Hengyang 421001, Peoples R China
[4] Univ Sydney, Sch Math & Stat, Camperdown, NSW 2006, Australia
[5] China Univ Petr, Coll Control Sci & Engn, Qingdao 266580, Peoples R China
基金
中国国家自然科学基金;
关键词
NLP; NLG; Controlled text generation; Text infilling; Semantic text exchange;
D O I
10.1007/s11276-023-03439-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of Social Internet of Things (SIoT), the widespread adoption of Internet of Things (IoT) devices and the rise of social media have resulted in a significant amount of social IoT data, including textual data. However, the mining and analysis of these data have been a challenging task. To address this issue, we leverage semantic text exchange techniques from the field of Natural Language Processing (NLP) to generate text that preserves the original style while modifying its semantic information. this task can be used for a wide range of NLP tasks, including data augmentation, adversarial attacks on text, and conversational systems. However, the present methods of semantic text exchange have some shortcomings, such as the generated text is not natural, the content is not rich, and the cost of model training is high. This paper presents a semantically independent sentence generation method, which can generate fluent and diverse semantically independent sentences. We combine three word substitution patterns with a pre-trained text infilling model to generate text. The essential advantage of the proposed method is that it does not require training the model for text infilling on specific datasets, which greatly reduces the cost of text generation. Experimental results show that our method generates more fluent and more diverse sentences than baselines. And it has a good performance in sentiment preservation of the original sentence and semantic similarity between the replacement words and the generated sentences.
引用
收藏
页码:6385 / 6398
页数:14
相关论文
共 50 条
  • [1] Pre-Trained Language Models for Text Generation: A Survey
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Nie, Jian-Yun
    Wen, Ji-Rong
    ACM COMPUTING SURVEYS, 2024, 56 (09)
  • [2] The Impact of Pre-trained Language Models on Turkish Semantic Role Labelling
    Oral, Elif
    Eryigit, Gulsen
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [3] Non-Autoregressive Text Generation with Pre-trained Language Models
    Su, Yixuan
    Cai, Deng
    Wang, Yan
    Vandyke, David
    Baker, Simon
    Li, Piji
    Collier, Nigel
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 234 - 243
  • [4] ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining
    Minh Phuc Nguyen
    Vu Hoang Tran
    Vu Hoang
    Ta Duc Huy
    Bui, Trung H.
    Truong, Steven Q. H.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 328 - 337
  • [5] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    ENGINEERING, 2023, 25 : 51 - 65
  • [6] Semantic Programming by Example with Pre-trained Models
    Verbruggen, Gust
    Le, Vu
    Gulwani, Sumit
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2021, 5 (OOPSLA):
  • [7] Semantic Importance-Aware Communications Using Pre-Trained Language Models
    Guo, Shuaishuai
    Wang, Yanhu
    Li, Shujing
    Saeed, Nasir
    IEEE COMMUNICATIONS LETTERS, 2023, 27 (09) : 2328 - 2332
  • [8] Fusion of Root and Affix Information with Pre-trained Language Models for Text Classification
    Wu, Yujia
    Zhang, Xuan
    Xiao, Guohua
    Ren, Hong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 488 - 498
  • [9] Attribute Alignment: Controlling Text Generation from Pre-trained Language Models
    Yu, Dian
    Yu, Zhou
    Sagae, Kenji
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2251 - 2268
  • [10] Controlling Pre-trained Language Models for Grade-Specific Text Simplification
    Agrawal, Sweta
    Carpuat, Marine
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12807 - 12819