Automatic Corpus Extension for Data-driven Natural Language Generation

被引:0
|
作者
Manishina, Elena [1 ]
Jabaian, Bassam [1 ]
Huet, Stephane [1 ]
Lefevre, Fabrice [1 ]
机构
[1] Univ Avignon, LIA CERI, Avignon, France
关键词
corpus building; natural language generation; automatic paraphrasing;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
As data-driven approaches started to make their way into the Natural Language Generation (NLG) domain, the need for automation of corpus building and extension became apparent. Corpus creation and extension in data-driven NLG domain traditionally involved manual paraphrasing performed by either a group of experts or with resort to crowd-sourcing. Building the training corpora manually is a costly enterprise which requires a lot of time and human resources. We propose to automate the process of corpus extension by integrating automatically obtained synonyms and paraphrases. Our methodology allowed us to significantly increase the size of the training corpus and its level of variability (the number of distinct tokens and specific syntactic structures). Our extension solutions are fully automatic and require only some initial validation. The human evaluation results confirm that in many cases native users favor the outputs of the model built on the extended corpus.
引用
收藏
页码:3624 / 3631
页数:8
相关论文
共 50 条
  • [41] The Research on the Application of Online English Corpus in Data-driven Learning
    Gu Tongqing
    PROCEEDINGS OF 2014 INTERNATIONAL SYMPOSIUM - REFORM AND INNOVATION OF HIGHER ENGINEERING EDUCATION, 2014, : 319 - 322
  • [42] Data-driven automatic generation of decision tree for motion retrieval with temporal-spatial features
    Xiang, Jian
    Zhuang, Yue-Ting
    Wu, Fei
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2718 - +
  • [43] Two-tier Data-Driven Intrusion Detection for Automatic Generation Control in Smart Grid
    Ali, Muhammad Qasim
    Yousefian, Reza
    Al-Shaer, Ehab
    Kamalasadan, Sukumar
    Zhu, Quanyan
    2014 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2014, : 292 - 300
  • [44] Data-driven natural computational psychophysiology in class
    Huang, Yong
    Huan, Yuxiang
    Zou, Zhuo
    Wang, Yijun
    Gao, Xiaorong
    Zheng, Lirong
    COGNITIVE NEURODYNAMICS, 2024, : 3477 - 3489
  • [45] Data-Driven Approach for Human Locomotion Generation
    Kim, Yejin
    Kim, Myunggyu
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2015, 15 (02)
  • [46] Data-driven estimation of expected photovoltaic generation
    Amaro e Silva, R.
    Monteiro Baptista, J.
    Brito, M. C.
    SOLAR ENERGY, 2018, 166 : 116 - 122
  • [47] Multiscale Data-Driven Energy Estimation and Generation
    Marchand, Tanguy
    Ozawa, Misaki
    Biroli, Giulio
    Mallat, Stéphane
    Physical Review X, 2023, 13 (04):
  • [48] Data-driven Communicative Behaviour Generation: A Survey
    Oralbayeva, Nurziya
    Aly, Amir
    Sandygulova, Anara
    Belpaeme, Tony
    ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2024, 13 (01)
  • [49] Skyfire: Data-Driven Seed Generation for Fuzzing
    Wang, Junjie
    Chen, Bihuan
    Wei, Lei
    Liu, Yang
    2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, : 579 - 594
  • [50] A Data-Driven Analysis of Robust Automatic Piano Transcription
    Edwards, Drew
    Dixon, Simon
    Benetos, Emmanouil
    Maezawa, Akira
    Kusaka, Yuta
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 681 - 685