Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data

被引：0

作者：

Fillippova, Katja ^{[1
]}

机构：

[1] Google Res, Berlin, Germany

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020 | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Neural text generation (data- or text-to-text) demonstrates remarkable performance when training data is abundant which for many applications is not the case. To collect a large corpus of parallel data, heuristic rules are often used but they inevitably let noise into the data, such as phrases in the output which cannot be explained by the input. Consequently, models pick up on the noise and may hallucinategenerate fluent but unsupported text. Our contribution is a simple but powerful technique to treat such hallucinations as a controllable aspect of the generated text, without dismissing any input and without modifying the model architecture. On the WikiBio corpus (Lebret et al., 2016), a particularly noisy dataset, we demonstrate the efficacy of the technique both in an automatic and in a human evaluation.

引用

页码：864 / 870

页数：7

共 50 条

[41] Vacillatory and BC learning on noisy data
Case, J
Jain, S
Stephan, F
THEORETICAL COMPUTER SCIENCE, 2000, 241 (1-2) : 115 - 141
[42] Learning to Generate Diverse Data From a Temporal Perspective for Data-Free Quantization
Luo, Hui
Zhang, Shuhai
Zhuang, Zhuangwei
Mai, Jiajie
Tan, Mingkui
Zhang, Jianlin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9484 - 9498
[43] Learning to Generate Equitable Text in Dialogue from Biased Training Data
Sicilia, Anthony
Alikhani, Malihe
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2898 - 2917
[44] Learning dynamics from coarse/noisy data with scalable symbolic regression
Chen, Zhao
Wang, Nan
MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2023, 190
[45] Learning from noisy label proportions for classifying online social data
Ardehaly E.M.
Culotta A.
Social Network Analysis and Mining, 2018, 8 (1)
[46] On Entropic Learning from Noisy Time Series in the Small Data Regime
Bassetti, Davide
Pospisil, Lukas
Horenko, Illia
ENTROPY, 2024, 26 (07)
[47] Rts: learning robustly from time series data with noisy label
Zhi Zhou
Yi-Xuan Jin
Yu-Feng Li
Frontiers of Computer Science, 2024, 18
[48] Rts: learning robustly from time series data with noisy label
Zhou, Zhi
Jin, Yi-Xuan
Li, Yu-Feng
FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
[49] Denoising Autoencoders for Learning from Noisy Patient-Reported Data
Rubin-Falcone, Harry
Lee, Joyce M.
Wiens, Jenna
CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 209, 2023, 209 : 393 - 409
[50] Unsupervised Confidence Approximation: Trustworthy Learning from Noisy Labelled Data
Rabbani, Navid
Bartoli, Adrien
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4611 - 4619

← 1 2 3 4 5 →