Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data

被引:0
|
作者
Fillippova, Katja [1 ]
机构
[1] Google Res, Berlin, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural text generation (data- or text-to-text) demonstrates remarkable performance when training data is abundant which for many applications is not the case. To collect a large corpus of parallel data, heuristic rules are often used but they inevitably let noise into the data, such as phrases in the output which cannot be explained by the input. Consequently, models pick up on the noise and may hallucinategenerate fluent but unsupported text. Our contribution is a simple but powerful technique to treat such hallucinations as a controllable aspect of the generated text, without dismissing any input and without modifying the model architecture. On the WikiBio corpus (Lebret et al., 2016), a particularly noisy dataset, we demonstrate the efficacy of the technique both in an automatic and in a human evaluation.
引用
收藏
页码:864 / 870
页数:7
相关论文
共 50 条
  • [41] Vacillatory and BC learning on noisy data
    Case, J
    Jain, S
    Stephan, F
    THEORETICAL COMPUTER SCIENCE, 2000, 241 (1-2) : 115 - 141
  • [42] Learning to Generate Diverse Data From a Temporal Perspective for Data-Free Quantization
    Luo, Hui
    Zhang, Shuhai
    Zhuang, Zhuangwei
    Mai, Jiajie
    Tan, Mingkui
    Zhang, Jianlin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9484 - 9498
  • [43] Learning to Generate Equitable Text in Dialogue from Biased Training Data
    Sicilia, Anthony
    Alikhani, Malihe
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2898 - 2917
  • [44] Learning dynamics from coarse/noisy data with scalable symbolic regression
    Chen, Zhao
    Wang, Nan
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2023, 190
  • [45] Learning from noisy label proportions for classifying online social data
    Ardehaly E.M.
    Culotta A.
    Social Network Analysis and Mining, 2018, 8 (1)
  • [46] On Entropic Learning from Noisy Time Series in the Small Data Regime
    Bassetti, Davide
    Pospisil, Lukas
    Horenko, Illia
    ENTROPY, 2024, 26 (07)
  • [47] Rts: learning robustly from time series data with noisy label
    Zhi Zhou
    Yi-Xuan Jin
    Yu-Feng Li
    Frontiers of Computer Science, 2024, 18
  • [48] Rts: learning robustly from time series data with noisy label
    Zhou, Zhi
    Jin, Yi-Xuan
    Li, Yu-Feng
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
  • [49] Denoising Autoencoders for Learning from Noisy Patient-Reported Data
    Rubin-Falcone, Harry
    Lee, Joyce M.
    Wiens, Jenna
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 209, 2023, 209 : 393 - 409
  • [50] Unsupervised Confidence Approximation: Trustworthy Learning from Noisy Labelled Data
    Rabbani, Navid
    Bartoli, Adrien
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4611 - 4619