Improving Utterance Rewriter Based on MMI and Text Data Augmentation

被引:0
|
作者
Yang, Lina [1 ]
Lin, Hai [1 ]
Li, Wei [1 ]
Meng, Zuqiang [1 ]
Wang, Patrick Shen-Pei [2 ]
Li, Xichun [3 ]
Luo, Huiwu [4 ]
机构
[1] Guangxi Univ, Nanning 530004, Peoples R China
[2] Northeastern Univ, Comp & Informat Sci, Boston, MA 02115 USA
[3] Guangxi Normal Univ Nationalities, Chongzuo 532200, Peoples R China
[4] Changsha Xingshen Intelligent Technol Co Ltd, Changsha 410100, Hunan, Peoples R China
关键词
Utterance rewrite; multiple dialogues; maximum mutual information; natural language processing; coreference resolution;
D O I
10.1142/S021800142259011X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-round dialogue tasks, how to maintain the consistency of model answers is a major research challenge. Every answer to the model should be time dependent, causal, and logical. In order to maintain the consistency of the personality, dialogue style, and context of the model, it is necessary to retain the key information in the historical dialogue as much as possible so that the model can generate more accurate answers. Utterance rewriting is a technique that replenishes the information of the current sentence by analyzing the historical dialogue, so as to retain the key information. This paper mainly uses text augmentation, Maximum Mutual Information (MMI) method and character correction method based on Knuth-Morria-Pratt (KMP) algorithm to improve the effect of utterance rewriting generation. The number of original statement rewriting datasets is limited, and the cost of manual manufacturing is too high. By using the method of text data augmentation based on coreference resolution, the positive dataset that is missing from the statement rewriting dataset is repaired. At the same time, the existing datasets are expanded to increase the number of data. The generated results are optimized by using the MMI method, and the KMP character correction method is used to modify the wrong characters to improve the overall accuracy.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Improving Multi-turn Dialogue Modelling with Utterance ReWriter
    Su, Hui
    Shen, Xiaoyu
    Zhang, Rongzhi
    Sun, Fei
    Hu, Pengwei
    Niu, Cheng
    Zhou, Jie
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 22 - 31
  • [2] Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
    Lin, Yist Y.
    Han, Tao
    Xu, Haihua
    Van Tung Pham
    Khassanov, Yerbolat
    Chong, Tze Yuang
    He, Yi
    Lu, Lu
    Ma, Zejun
    INTERSPEECH 2023, 2023, : 904 - 908
  • [3] Using Data Augmentation for Improving Text Summarization
    Constantin, Daniel
    Mihaescu, Marian Cristian
    Heras, Stella
    Jordan, Jaume
    Palanca, Javier
    Julian, Vicente
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT II, 2025, 15347 : 132 - 144
  • [4] Improving Text Classification with Large Language Model-Based Data Augmentation
    Zhao, Huanhuan
    Chen, Haihua
    Ruggles, Thomas A.
    Feng, Yunhe
    Singh, Debjani
    Yoon, Hong-Jun
    ELECTRONICS, 2024, 13 (13)
  • [5] A Text Data Augmentation Approach for Improving the Performance of CNN
    Abulaish, Muhammad
    Sah, Amit Kumar
    2019 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2019, : 660 - 665
  • [6] Improving Automated Evaluation of Formative Assessments with Text Data Augmentation
    Cochran, Keith
    Cohn, Clayton
    Hutchins, Nicole
    Biswas, Gautam
    Hastings, Peter
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, 2022, 13355 : 390 - 401
  • [7] Improving DRS-to-Text Generation Through Delexicalization and Data Augmentation
    Amin, Muhammad Saad
    Anselma, Luca
    Mazzei, Alessandro
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 121 - 136
  • [8] Improving Data Driven Inverse Text Normalization using Data Augmentation and Machine Translation
    Paul, Debjyoti
    Pang, Yutong
    Chen, Szu-Jui
    Zhang, Xuedong
    INTERSPEECH 2022, 2022, : 5221 - 5222
  • [9] Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
    Cong-Thanh Do
    Imai, Shuhei
    Doddipatla, Rama
    Hain, Thomas
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 136 - 140
  • [10] Tokenization-based data augmentation for text classification
    Prakrankamanant, Patawee
    Chuangsuwanich, Ekapol
    2022 19TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2022), 2022,