Improving Utterance Rewriter Based on MMI and Text Data Augmentation

被引:0
|
作者
Yang, Lina [1 ]
Lin, Hai [1 ]
Li, Wei [1 ]
Meng, Zuqiang [1 ]
Wang, Patrick Shen-Pei [2 ]
Li, Xichun [3 ]
Luo, Huiwu [4 ]
机构
[1] Guangxi Univ, Nanning 530004, Peoples R China
[2] Northeastern Univ, Comp & Informat Sci, Boston, MA 02115 USA
[3] Guangxi Normal Univ Nationalities, Chongzuo 532200, Peoples R China
[4] Changsha Xingshen Intelligent Technol Co Ltd, Changsha 410100, Hunan, Peoples R China
关键词
Utterance rewrite; multiple dialogues; maximum mutual information; natural language processing; coreference resolution;
D O I
10.1142/S021800142259011X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-round dialogue tasks, how to maintain the consistency of model answers is a major research challenge. Every answer to the model should be time dependent, causal, and logical. In order to maintain the consistency of the personality, dialogue style, and context of the model, it is necessary to retain the key information in the historical dialogue as much as possible so that the model can generate more accurate answers. Utterance rewriting is a technique that replenishes the information of the current sentence by analyzing the historical dialogue, so as to retain the key information. This paper mainly uses text augmentation, Maximum Mutual Information (MMI) method and character correction method based on Knuth-Morria-Pratt (KMP) algorithm to improve the effect of utterance rewriting generation. The number of original statement rewriting datasets is limited, and the cost of manual manufacturing is too high. By using the method of text data augmentation based on coreference resolution, the positive dataset that is missing from the statement rewriting dataset is repaired. At the same time, the existing datasets are expanded to increase the number of data. The generated results are optimized by using the MMI method, and the KMP character correction method is used to modify the wrong characters to improve the overall accuracy.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] Enhancing Children's Short Utterance Based ASV Using Data Augmentation Techniques and Feature Concatenation Approach
    Aziz, Shahid
    Shahnawazuddin, Syed
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 380 - 394
  • [32] Hierarchical Data Augmentation and the Application in Text Classification
    Yu, Shujuan
    Yang, Jie
    Liu, Danlei
    Li, Runqi
    Zhang, Yun
    Zhao, Shengmei
    IEEE ACCESS, 2019, 7 : 185476 - 185485
  • [33] Improving the Efficiency of Dysarthria Voice Conversion System Based on Data Augmentation
    Zheng, Wei-Zhong
    Han, Ji-Yan
    Chen, Chen-Yu
    Chang, Yuh-Jer
    Lai, Ying-Hui
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 4613 - 4623
  • [34] Iterative Translation-Based Data Augmentation Method for Text Classification Tasks
    Lee, Sangwon
    Liu, Ling
    Choi, Wonik
    IEEE ACCESS, 2021, 9 : 160437 - 160445
  • [35] GAN-Based Data Augmentation For Improving The Classification Of EEG Signals
    Bhat, Sudhanva
    Hortal, Enrique
    THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 453 - 458
  • [36] Explainability-Based Mix-Up Approach for Text Data Augmentation
    Kwon, Soonki
    Lee, Younghoon
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2023, 17 (01)
  • [37] Parallel Data Augmentation for Text-based Person Re-identification
    Cai, Han-Qing
    Li, Xin
    Ji, Yi
    Li, Ying
    Liu, Chun-Ping
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [38] Data Augmentation for Text Generation Without Any Augmented Data
    Bi, Wei
    Li, Huayang
    Huang, Jiacheng
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2223 - 2237
  • [39] You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
    Laptev, Aleksandr
    Korostik, Roman
    Svischev, Aleksey
    Andrusenko, Andrei
    Medennikov, Ivan
    Rybin, Sergey
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 439 - 444
  • [40] Application of Generative Adversarial Networks and Shapley Algorithm Based on Easy Data Augmentation for Imbalanced Text Data
    Wu, Jheng-Long
    Huang, Shuoyen
    APPLIED SCIENCES-BASEL, 2022, 12 (21):