Screening through a broad pool: Towards better diversity for lexically constrained text generation

被引:1
|
作者
Yuan, Changsen [1 ]
Huang, Heyan [1 ]
Cao, Yixin [2 ]
Cao, Qianwen [3 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Singapore Management Univ, Singapore, Singapore
[3] China Univ Petr, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Constrained text generation; Text diversity; diversity Randomly insert; Randomly mask; Pre-trained language models;
D O I
10.1016/j.ipm.2023.103602
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Lexically constrained text generation (CTG) is to generate text that contains given constrained keywords. However, the text diversity of existing models is still unsatisfactory. In this paper, we propose a lightweight dynamic refinement strategy that aims at increasing the randomness of inference to improve generation richness and diversity while maintaining a high level of fluidity and integrity. Our basic idea is to enlarge the number and length of candidate sentences in each iteration, and choose the best for subsequent refinement. On the one hand, different from previous works, which carefully insert one token between two words per action, we insert an uncertain number of tokens following a well-designed distribution. To ensure high-quality decoding, the insertion number increases as more words are generated. On the other hand, we randomly mask an increasing number of generated words to force Pre-trained Language Models (PLMs) to examine the whole sentence via reconstruction. We have conducted extensive experiments and designed four dimensions for human evaluation. Compared with important baseline (CBART (He, 2021)), our method improves the 1.3% (B-2), 0.1% (B-4), 0.016 (N2), 0.016 (N-4), 5.7% (M), 1.9% (SB-4), 0.6% (D-2), 0.5% (D-4) on One-Billion-Word dataset (Chelba et al., 2014) and 1.6% (B-2), 0.1% (B-4), 0.121 (N-2), 0.120 (N-4), 0.0% (M), 6.7% (SB-4), 2.7% (D-2), 3.8% (D-4) on Yelp dataset (Cho et al., 2018). The results demonstrate that our method is more diverse and plausible.
引用
收藏
页数:12
相关论文
共 12 条
  • [1] Parallel Refinements for Lexically Constrained Text Generation with BART
    He, Xingwei
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8653 - 8666
  • [2] Gradient-guided Unsupervised Lexically Constrained Text Generation
    Sha, Lei
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8692 - 8703
  • [3] RAR: Recombination and augmented replacement method for insertion-based lexically constrained text generation
    Kang, Fengrui
    Huang, Xianying
    Li, Bingyu
    NEUROCOMPUTING, 2024, 597
  • [4] Towards Better Hierarchical Text Classification with Data Generation
    Wang, Yue
    Qiao, Dan
    Li, Juntao
    Chang, Jinxiong
    Zhang, Qishen
    Liu, Zhongyi
    Zhang, Guannan
    Zhang, Min
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7722 - 7739
  • [5] Towards Content Transfer through Grounded Text Generation
    Prabhumoye, Shrimai
    Quirk, Chris
    Galley, Michel
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2622 - 2632
  • [6] Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling
    Xiong, Chenyan
    Liu, Zhengzhong
    Callan, Jamie
    Liu, Tie-Yan
    ACM/SIGIR PROCEEDINGS 2018, 2018, : 575 - 584
  • [7] Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation
    Tan, Zhaorui
    Yang, Xi
    Ye, Zihan
    Wang, Qiufeng
    Yan, Yuyao
    Nguyen, Anh
    Huang, Kaizhu
    PATTERN RECOGNITION, 2023, 144
  • [8] Improving Radiology Report Generation Quality and Diversity through Reinforcement Learning and Text Augmentation
    Parres, Daniel
    Albiol, Alberto
    Paredes, Roberto
    BIOENGINEERING-BASEL, 2024, 11 (04):
  • [9] Towards better implementation of cancer screening in Europe through improved monitoring and evaluation and greater engagement of cancer registries
    Anttila, Ahti
    Lonnberg, Stefan
    Ponti, Antonio
    Suonio, Eero
    Villain, Patricia
    Coebergh, Jan Willem
    von Karsa, Lawrence
    EUROPEAN JOURNAL OF CANCER, 2015, 51 (02) : 241 - 251
  • [10] Towards better implementation of cancer screening in Europe through improved monitoring and evaluation and greater engagement of cancer registries (Reprinted)
    Anttila, Ahti
    Lonnberg, Stefan
    Ponti, Antonio
    Suonio, Eero
    Villain, Patricia
    Coebergh, Jan Willem
    von Karsa, Lawrence
    EUROPEAN JOURNAL OF CANCER, 2015, 51 (09) : 1080 - 1081