Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data

被引:0
|
作者
Fillippova, Katja [1 ]
机构
[1] Google Res, Berlin, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural text generation (data- or text-to-text) demonstrates remarkable performance when training data is abundant which for many applications is not the case. To collect a large corpus of parallel data, heuristic rules are often used but they inevitably let noise into the data, such as phrases in the output which cannot be explained by the input. Consequently, models pick up on the noise and may hallucinategenerate fluent but unsupported text. Our contribution is a simple but powerful technique to treat such hallucinations as a controllable aspect of the generated text, without dismissing any input and without modifying the model architecture. On the WikiBio corpus (Lebret et al., 2016), a particularly noisy dataset, we demonstrate the efficacy of the technique both in an automatic and in a human evaluation.
引用
收藏
页码:864 / 870
页数:7
相关论文
共 50 条
  • [31] Learning Causal Estimates of Linear Operators From Noisy Data
    Cacace, Filippo
    Germani, Alfredo
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (07) : 3902 - 3914
  • [32] Learning nonparametric ordinary differential equations from noisy data
    Lahouel, Kamel
    Wells, Michael
    Rielly, Victor
    Lew, Ethan
    Lovitz, David
    Jedynak, Bruno M.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2024, 507
  • [33] Convergence Rates for Learning Linear Operators from Noisy Data
    de Hoop, Maarten V.
    Kovachki, Nikola B.
    Nelsen, Nicholas H.
    Stuart, Andrew M.
    SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION, 2023, 11 (02): : 480 - 513
  • [34] DC Proposal: Ontology Learning from Noisy Linked Data
    Zhu, Man
    SEMANTIC WEB - ISWC 2011, PT II, 2011, 7032 : 373 - 380
  • [35] Learning from Massive Noisy Labeled Data for Image Classification
    Xiao, Tong
    Xia, Tian
    Yang, Yi
    Huang, Chang
    Wang, Xiaogang
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2691 - 2699
  • [36] Learning from Imbalanced Data in Presence of Noisy and Borderline Examples
    Napierala, Krystyna
    Stefanowski, Jerzy
    Wilk, Szymon
    ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2010, 6086 : 158 - 167
  • [37] Learning of networked spreading models from noisy and incomplete data
    Wilinski, Mateusz
    Lokhov, Andrey Y.
    PHYSICAL REVIEW E, 2024, 110 (05)
  • [38] SCAN: Learning Speaker Identity From Noisy Sensor Data
    Lu, Chris Xiaoxuan
    Wen, Hongkai
    Wang, Sen
    Markham, Andrew
    Trigoni, Niki
    2017 16TH ACM/IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING IN SENSOR NETWORKS (IPSN), 2017, : 67 - 78
  • [39] Trade-offs in learning controllers from noisy data
    Bisoffi, Andrea
    De Persis, Claudio
    Tesi, Pietro
    Systems and Control Letters, 2021, 154
  • [40] Learning Robust Data-Based LQG Controllers From Noisy Data
    Liu, Wenjie
    Wang, Gang
    Sun, Jian
    Bullo, Francesco
    Chen, Jie
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (12) : 8526 - 8538