Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay

被引:0
|
作者
Sakuma, Jin [1 ]
Fujie, Shinya [1 ,2 ]
Zhao, Huaibo [1 ]
Kobayashi, Tetsunori [1 ]
机构
[1] Waseda Univ, Tokyo, Japan
[2] Chiba Inst Technol, Chiba, Japan
来源
关键词
spoken dialog systems; turn-taking; response timing; streaming ASR; TURN-TAKING;
D O I
10.21437/Interspeech.2023-1618
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In conversational systems, the proper timing of the system's response is critical to maintaining a comfortable conversation. To achieve appropriate timing estimation, it is important to know what the users have said, including their most recent words, but ASR delay usually prevents the use of full user utterance. In this paper, we attempted to employ an extremely low latency ASR model called Multi-Look-Ahead ASR by Zhao et al. to enable near full utterance for response timing estimation. Additionally, we examined the effectiveness of using low latency ASR in combination with a parameter called Estimates of Syntactic Completeness (ESC), which indicates how soon the user's speech is completed. We evaluated on a Japanese simulated dialog database of a restaurant information center. The results confirmed that reducing ASR delay improves the accuracy of response timing estimation. This effect also appeared when the method using ESC is combined with the use of low latency ASR.
引用
收藏
页码:2668 / 2672
页数:5
相关论文
共 50 条
  • [21] Caller Response Timing Patterns in Spoken Dialog Systems
    Witt, Silke M.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 326 - 329
  • [22] Automated Recognition of Paralinguistic Signals in Spoken Dialogue Systems: Ways of Improvement
    Sidorov, Maxim
    Schmitt, Alexander
    Semenkin, Eugene S.
    JOURNAL OF SIBERIAN FEDERAL UNIVERSITY-MATHEMATICS & PHYSICS, 2015, 8 (02): : 208 - 216
  • [23] Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation
    Sakuma, Jin
    Fujie, Shinya
    Kobayashi, Tetsunori
    INTERSPEECH 2022, 2022, : 4486 - 4490
  • [24] Investigating Human Speech Processing as a Model for Spoken Dialogue Systems: An Experimental Framework
    Hacker, Martin
    Elsweiler, David
    Ludwig, Bernd
    ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 1137 - 1138
  • [25] Estimation of Speech Intelligibility Using Speech Recognition Systems
    Takano, Yusuke
    Kondo, Kazuhiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (12): : 3368 - 3376
  • [26] Analysis and effect of speaking style for dialogue speech recognition
    Aono, K
    Yasuda, K
    Takezawa, T
    Yamamoto, S
    Yanagida, M
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 339 - 344
  • [27] Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
    Song, Kaitao
    Wan, Teng
    Wang, Bixia
    Jiang, Huiqiang
    Qiu, Luna
    Xu, Jiahang
    Jiang, Liping
    Lou, Qun
    Yang, Yuqing
    Li, Dongsheng
    Wang, Xudong
    Qiu, Lili
    INTERSPEECH 2022, 2022, : 4820 - 4824
  • [28] Quantifying and Improving the Performance of Speech Recognition Systems on Dysphonic Speech
    Lopez, Julio C. Hidalgo C.
    Sandeep, Shelly
    Wright, MaKayla
    Wandell, Grace M. M.
    Law, Anthony B. B.
    OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 2023, 168 (05) : 1130 - 1138
  • [29] Speech understanding for spoken dialogue systems: From corpus harvesting to grammar rule induction
    Iosif, Elias
    Klasinas, Ioannis
    Athanasopoulou, Georgia
    Palogiannidi, Elisavet
    Georgiladakis, Spiros
    Louka, Katerina
    Potamianos, Alexandros
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 272 - 297
  • [30] Integrating topic estimation and dialogue history for domain selection in multi-domain spoken dialogue systems
    Ikeda, Satoshi
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 294 - 304