Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay

被引:0
|
作者
Sakuma, Jin [1 ]
Fujie, Shinya [1 ,2 ]
Zhao, Huaibo [1 ]
Kobayashi, Tetsunori [1 ]
机构
[1] Waseda Univ, Tokyo, Japan
[2] Chiba Inst Technol, Chiba, Japan
来源
关键词
spoken dialog systems; turn-taking; response timing; streaming ASR; TURN-TAKING;
D O I
10.21437/Interspeech.2023-1618
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In conversational systems, the proper timing of the system's response is critical to maintaining a comfortable conversation. To achieve appropriate timing estimation, it is important to know what the users have said, including their most recent words, but ASR delay usually prevents the use of full user utterance. In this paper, we attempted to employ an extremely low latency ASR model called Multi-Look-Ahead ASR by Zhao et al. to enable near full utterance for response timing estimation. Additionally, we examined the effectiveness of using low latency ASR in combination with a parameter called Estimates of Syntactic Completeness (ESC), which indicates how soon the user's speech is completed. We evaluated on a Japanese simulated dialog database of a restaurant information center. The results confirmed that reducing ASR delay improves the accuracy of response timing estimation. This effect also appeared when the method using ESC is combined with the use of low latency ASR.
引用
收藏
页码:2668 / 2672
页数:5
相关论文
共 50 条
  • [41] Human Speech Processing for Pedestrian Assistance: Towards Cognitive Error Handling in Spoken Dialogue Systems
    Hacker, Martin
    STAIRS 2014, 2014, 264 : 131 - 140
  • [42] Enhancing Speech Understanding in Spoken Dialogue Systems by Means of a New Frame-Correction Technique
    Lopez-Cozar, Ramon
    Callejas, Zoraida
    Griol, David
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 750 - 753
  • [43] TOWARDS REDUCING THE NEED FOR SPEECH TRAINING DATA TO BUILD SPOKEN LANGUAGE UNDERSTANDING SYSTEMS
    Thomas, Samuel
    Kuo, Hong-Kwang J.
    Kingsbury, Brian
    Saon, George
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7932 - 7936
  • [44] HANDS-FREE SPEECH RECOGNITION CHALLENGE FOR REAL-WORLD SPEECH DIALOGUE SYSTEMS
    Saruwatari, Hiroshi
    Kawanami, Hiromichi
    Takeuchi, Shota
    Takahashi, Yu
    Cincarek, Tobias
    Shikano, Kiyohiro
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3729 - 3732
  • [45] Efficient language model development for spoken dialogue recognition and its evaluation on operator's speech at call centers
    Miki, Kiyokazu
    Hatazaki, Kaichiro
    Hattori, Hiroaki
    PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 80 - 86
  • [46] Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes
    Meyer, Bernd T.
    Brand, Thomas
    Kollmeier, Birger
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 129 (01): : 388 - 403
  • [47] The Effect of Postlexical Deletion on Automatic Speech Recognition in Fast Spontaneously Spoken Zulu
    van der Westhuizen, Ewald
    Niesler, Thomas
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3559 - 3563
  • [48] Towards robust spoken dialogue systems using large-scale in-car speech corpus
    Yamaguchi, Yukiko
    Hayashi, Keita
    Ono, Takahiro
    Kato, Shingo
    Irie, Yuki
    Ohno, Tomohiro
    Murao, Hiroya
    Matsubara, Shigeki
    Kawaguchi, Nobuo
    Takeda, Kazuya
    ADVANCES FOR IN-VEHICLE AND MOBILE SYSTEMS: CHALLENGES FOR INTERNATIONAL STANDARDS, 2007, : 211 - 222
  • [49] Optimizing Fuzzy Inference Systems for Improving Speech Emotion Recognition
    Elbarougy, Reda
    Akagi, Masato
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 85 - 95
  • [50] The Combination of CMS with PMC for Improving Robustness of Speech Recognition Systems
    Veisi, Hadi
    Sameti, Hossein
    ADVANCES IN COMPUTER SCIENCE AND ENGINEERING, 2008, 6 : 825 - 829