Automatic Corpus Extension for Data-driven Natural Language Generation

被引:0
|
作者
Manishina, Elena [1 ]
Jabaian, Bassam [1 ]
Huet, Stephane [1 ]
Lefevre, Fabrice [1 ]
机构
[1] Univ Avignon, LIA CERI, Avignon, France
关键词
corpus building; natural language generation; automatic paraphrasing;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
As data-driven approaches started to make their way into the Natural Language Generation (NLG) domain, the need for automation of corpus building and extension became apparent. Corpus creation and extension in data-driven NLG domain traditionally involved manual paraphrasing performed by either a group of experts or with resort to crowd-sourcing. Building the training corpora manually is a costly enterprise which requires a lot of time and human resources. We propose to automate the process of corpus extension by integrating automatically obtained synonyms and paraphrases. Our methodology allowed us to significantly increase the size of the training corpus and its level of variability (the number of distinct tokens and specific syntactic structures). Our extension solutions are fully automatic and require only some initial validation. The human evaluation results confirm that in many cases native users favor the outputs of the model built on the extended corpus.
引用
收藏
页码:3624 / 3631
页数:8
相关论文
共 50 条
  • [21] A Data-Driven Approach to Infer Knowledge Base Representation for Natural Language Relations
    Luo, Kangqi
    Luo, Xusheng
    Chen, Xianyang
    Zhu, Kenny Q.
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1174 - 1180
  • [22] A survey of methods for revealing and overcoming weaknesses of data-driven Natural Language Understanding
    Schlegel, Viktor
    Nenadic, Goran
    Batista-Navarro, Riza
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (01) : 1 - 31
  • [23] A data-driven approach for supporting extension processing
    Lee, LT
    Wei, CR
    SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 3392 - 3395
  • [24] Natural language spoken interface control using data-driven semantic inference
    Bellegarda, JR
    Silverman, KEA
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (03): : 267 - 277
  • [25] Data-driven materials research enabled by natural language processing and information extraction
    Olivetti, Elsa A.
    Cole, Jacqueline M.
    Kim, Edward
    Kononova, Olga
    Ceder, Gerbrand
    Han, Thomas Yong-Jin
    Hiszpanski, Anna M.
    APPLIED PHYSICS REVIEWS, 2020, 7 (04)
  • [26] Big data-driven automatic generation of ship route planning in complex maritime environments
    Peng Han
    Xiaoxia Yang
    Acta Oceanologica Sinica, 2020, 39 : 113 - 120
  • [27] Big data-driven automatic generation of ship route planning in complex maritime environments
    Han, Peng
    Yang, Xiaoxia
    ACTA OCEANOLOGICA SINICA, 2020, 39 (08) : 113 - 120
  • [28] A Comparison of Data-Driven Automatic Syllabification Methods
    Adsett, Connie R.
    Marchand, Yannick
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5721 : 174 - 181
  • [29] Data-Driven Automatic Generation Control of Interconnected Power Grids Subject to Deception Attacks
    Asadi, Yasin
    Farsangi, Malihe Maghfoori
    Amani, Ali Moradi
    Bijami, Ehsan
    Alhelou, Hassan Haes
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (09) : 7591 - 7600
  • [30] Big data-driven automatic generation of ship route planning in complex maritime environments
    Peng Han
    Xiaoxia Yang
    ActaOceanologicaSinica, 2020, 39 (08) : 113 - 120