Profanity-Avoiding Training Framework for Seq2seq Models with Certified Robustness

被引:0
|
作者
Zhang, Hengtong [1 ,3 ]
Zheng, Tianhang [2 ]
Li, Yaliang [4 ]
Gao, Jing [1 ]
Su, Lu [1 ]
Li, Bo [5 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Univ Toronto, Toronto, ON, Canada
[3] Univ Buffalo, Buffalo, NY 14260 USA
[4] Alibaba Grp, Hangzhou, Peoples R China
[5] Univ Illinois, Urbana, IL USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Seq2seq models have demonstrated their incredible effectiveness in a large variety of applications. However, recent research has shown that inappropriate language in training samples and well-designed testing cases can induce seq2seq models to output profanity. These outputs may potentially hurt the usability of seq2seq models and make the end-users feel offended. To address this problem, we propose a training framework with certified robustness to eliminate the causes that trigger the generation of profanity. The proposed training framework leverages merely a short list of profanity examples to prevent seq2seq models from generating a broader spectrum of profanity. The framework is composed of a pattern-eliminating training component to suppress the impact of language patterns with profanity in the training set, and a trigger-resisting training component to provide certified robustness for seq2seq models against intentionally injected profanity-triggering expressions in test samples. In the experiments, we consider two representative NLP tasks that seq2seq can be applied to, i.e., style transfer and dialogue generation. Extensive experimental results show that the proposed training framework can successfully prevent the NLP models from generating profanity.
引用
收藏
页码:5151 / 5161
页数:11
相关论文
共 50 条
  • [1] Cold Fusion: Training Seq2Seq Models Together with Language Models
    Sriram, Anuroop
    Jun, Heewoo
    Satheesh, Sanjeev
    Coates, Adam
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 387 - 391
  • [2] Sparsing and Smoothing for the seq2seq Models
    Zhao S.
    Liang Z.
    Wen J.
    Chen J.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (03): : 464 - 472
  • [3] Application of Seq2Seq Models on Code Correction
    Huang, Shan
    Zhou, Xiao
    Chin, Sang
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [4] Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models
    Soltan, Saleh
    Rosenbaum, Andy
    Falke, Tobias
    Lu, Qin
    Rumshisky, Anna
    Hamza, Wael
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9380 - 9394
  • [5] A Primer on Seq2Seq Models for Generative Chatbots
    Scotti, Vincenzo
    Sbattella, Licia
    Tedesco, Roberto
    ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [6] Seq2Seq models for recommending short text conversations
    Torres, Johnny
    Vaca, Carmen
    Teran, Luis
    Abad, Cristina L.
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 150
  • [7] Seq2Seq Deep Learning Models for Microtext Normalization
    Satapathy, Ranjan
    Li, Yang
    Cavallari, Sandro
    Cambria, Erik
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [8] Learning Transductions and Alignments with RNN Seq2seq Models
    Wang, Zhengxiang
    INTERNATIONAL CONFERENCE ON GRAMMATICAL INFERENCE, VOL 217, 2023, 217 : 223 - 249
  • [9] Seq2Seq Surrogates of Epidemic Models to Facilitate Bayesian Inference
    Charles, Giovanni
    Wolock, Timothy M.
    Winskill, Peter
    Ghani, Azra
    Bhatt, Samir
    Flaxman, Seth
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14170 - 14177
  • [10] Fine Grained Named Entity Recognition via Seq2seq Framework
    Zhu, Huiming
    He, Chunhui
    Fang, Yang
    Xiao, Weidong
    IEEE ACCESS, 2020, 8 : 53953 - 53961