Profanity-Avoiding Training Framework for Seq2seq Models with Certified Robustness

被引：0

作者：

Zhang, Hengtong ^{[1
,3
]}

Zheng, Tianhang ^{[2
]}

Li, Yaliang ^{[4
]}

Gao, Jing ^{[1
]}

Su, Lu ^{[1
]}

Li, Bo ^{[5
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

[2] Univ Toronto, Toronto, ON, Canada

[3] Univ Buffalo, Buffalo, NY 14260 USA

[4] Alibaba Grp, Hangzhou, Peoples R China

[5] Univ Illinois, Urbana, IL USA

来源：

2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Seq2seq models have demonstrated their incredible effectiveness in a large variety of applications. However, recent research has shown that inappropriate language in training samples and well-designed testing cases can induce seq2seq models to output profanity. These outputs may potentially hurt the usability of seq2seq models and make the end-users feel offended. To address this problem, we propose a training framework with certified robustness to eliminate the causes that trigger the generation of profanity. The proposed training framework leverages merely a short list of profanity examples to prevent seq2seq models from generating a broader spectrum of profanity. The framework is composed of a pattern-eliminating training component to suppress the impact of language patterns with profanity in the training set, and a trigger-resisting training component to provide certified robustness for seq2seq models against intentionally injected profanity-triggering expressions in test samples. In the experiments, we consider two representative NLP tasks that seq2seq can be applied to, i.e., style transfer and dialogue generation. Extensive experimental results show that the proposed training framework can successfully prevent the NLP models from generating profanity.

引用

页码：5151 / 5161

页数：11

共 50 条

[1] Cold Fusion: Training Seq2Seq Models Together with Language Models
Sriram, Anuroop
Jun, Heewoo
Satheesh, Sanjeev
Coates, Adam
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 387 - 391
[2] Sparsing and Smoothing for the seq2seq Models
Zhao S.
Liang Z.
Wen J.
Chen J.
IEEE Transactions on Artificial Intelligence, 2023, 4 (03): : 464 - 472
[3] Application of Seq2Seq Models on Code Correction
Huang, Shan
Zhou, Xiao
Chin, Sang
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
[4] Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models
Soltan, Saleh
Rosenbaum, Andy
Falke, Tobias
Lu, Qin
Rumshisky, Anna
Hamza, Wael
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9380 - 9394
[5] A Primer on Seq2Seq Models for Generative Chatbots
Scotti, Vincenzo
Sbattella, Licia
Tedesco, Roberto
ACM COMPUTING SURVEYS, 2024, 56 (03)
[6] Seq2Seq models for recommending short text conversations
Torres, Johnny
Vaca, Carmen
Teran, Luis
Abad, Cristina L.
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 150
[7] Seq2Seq Deep Learning Models for Microtext Normalization
Satapathy, Ranjan
Li, Yang
Cavallari, Sandro
Cambria, Erik
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[8] Learning Transductions and Alignments with RNN Seq2seq Models
Wang, Zhengxiang
INTERNATIONAL CONFERENCE ON GRAMMATICAL INFERENCE, VOL 217, 2023, 217 : 223 - 249
[9] Seq2Seq Surrogates of Epidemic Models to Facilitate Bayesian Inference
Charles, Giovanni
Wolock, Timothy M.
Winskill, Peter
Ghani, Azra
Bhatt, Samir
Flaxman, Seth
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14170 - 14177
[10] Fine Grained Named Entity Recognition via Seq2seq Framework
Zhu, Huiming
He, Chunhui
Fang, Yang
Xiao, Weidong
IEEE ACCESS, 2020, 8 : 53953 - 53961

← 1 2 3 4 5 →