Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay

被引：0

作者：

Sakuma, Jin ^{[1
]}

Fujie, Shinya ^{[1
,2
]}

Zhao, Huaibo ^{[1
]}

Kobayashi, Tetsunori ^{[1
]}

机构：

[1] Waseda Univ, Tokyo, Japan

[2] Chiba Inst Technol, Chiba, Japan

来源：

INTERSPEECH 2023 | 2023年

关键词：

spoken dialog systems; turn-taking; response timing; streaming ASR; TURN-TAKING;

D O I：

10.21437/Interspeech.2023-1618

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In conversational systems, the proper timing of the system's response is critical to maintaining a comfortable conversation. To achieve appropriate timing estimation, it is important to know what the users have said, including their most recent words, but ASR delay usually prevents the use of full user utterance. In this paper, we attempted to employ an extremely low latency ASR model called Multi-Look-Ahead ASR by Zhao et al. to enable near full utterance for response timing estimation. Additionally, we examined the effectiveness of using low latency ASR in combination with a parameter called Estimates of Syntactic Completeness (ESC), which indicates how soon the user's speech is completed. We evaluated on a Japanese simulated dialog database of a restaurant information center. The results confirmed that reducing ASR delay improves the accuracy of response timing estimation. This effect also appeared when the method using ESC is combined with the use of low latency ASR.

引用

页码：2668 / 2672

页数：5

共 50 条

[21] Caller Response Timing Patterns in Spoken Dialog Systems
Witt, Silke M.
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 326 - 329
[22] Automated Recognition of Paralinguistic Signals in Spoken Dialogue Systems: Ways of Improvement
Sidorov, Maxim
Schmitt, Alexander
Semenkin, Eugene S.
JOURNAL OF SIBERIAN FEDERAL UNIVERSITY-MATHEMATICS & PHYSICS, 2015, 8 (02): : 208 - 216
[23] Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation
Sakuma, Jin
Fujie, Shinya
Kobayashi, Tetsunori
INTERSPEECH 2022, 2022, : 4486 - 4490
[24] Investigating Human Speech Processing as a Model for Spoken Dialogue Systems: An Experimental Framework
Hacker, Martin
Elsweiler, David
Ludwig, Bernd
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 1137 - 1138
[25] Estimation of Speech Intelligibility Using Speech Recognition Systems
Takano, Yusuke
Kondo, Kazuhiro
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (12): : 3368 - 3376
[26] Analysis and effect of speaking style for dialogue speech recognition
Aono, K
Yasuda, K
Takezawa, T
Yamamoto, S
Yanagida, M
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 339 - 344
[27] Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
Song, Kaitao
Wan, Teng
Wang, Bixia
Jiang, Huiqiang
Qiu, Luna
Xu, Jiahang
Jiang, Liping
Lou, Qun
Yang, Yuqing
Li, Dongsheng
Wang, Xudong
Qiu, Lili
INTERSPEECH 2022, 2022, : 4820 - 4824
[28] Quantifying and Improving the Performance of Speech Recognition Systems on Dysphonic Speech
Lopez, Julio C. Hidalgo C.
Sandeep, Shelly
Wright, MaKayla
Wandell, Grace M. M.
Law, Anthony B. B.
OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 2023, 168 (05) : 1130 - 1138
[29] Speech understanding for spoken dialogue systems: From corpus harvesting to grammar rule induction
Iosif, Elias
Klasinas, Ioannis
Athanasopoulou, Georgia
Palogiannidi, Elisavet
Georgiladakis, Spiros
Louka, Katerina
Potamianos, Alexandros
COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 272 - 297
[30] Integrating topic estimation and dialogue history for domain selection in multi-domain spoken dialogue systems
Ikeda, Satoshi
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 294 - 304

← 1 2 3 4 5 →