Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay

被引：0

作者：

Sakuma, Jin ^{[1
]}

Fujie, Shinya ^{[1
,2
]}

Zhao, Huaibo ^{[1
]}

Kobayashi, Tetsunori ^{[1
]}

机构：

[1] Waseda Univ, Tokyo, Japan

[2] Chiba Inst Technol, Chiba, Japan

来源：

INTERSPEECH 2023 | 2023年

关键词：

spoken dialog systems; turn-taking; response timing; streaming ASR; TURN-TAKING;

D O I：

10.21437/Interspeech.2023-1618

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In conversational systems, the proper timing of the system's response is critical to maintaining a comfortable conversation. To achieve appropriate timing estimation, it is important to know what the users have said, including their most recent words, but ASR delay usually prevents the use of full user utterance. In this paper, we attempted to employ an extremely low latency ASR model called Multi-Look-Ahead ASR by Zhao et al. to enable near full utterance for response timing estimation. Additionally, we examined the effectiveness of using low latency ASR in combination with a parameter called Estimates of Syntactic Completeness (ESC), which indicates how soon the user's speech is completed. We evaluated on a Japanese simulated dialog database of a restaurant information center. The results confirmed that reducing ASR delay improves the accuracy of response timing estimation. This effect also appeared when the method using ESC is combined with the use of low latency ASR.

引用

页码：2668 / 2672

页数：5

共 50 条

[41] Human Speech Processing for Pedestrian Assistance: Towards Cognitive Error Handling in Spoken Dialogue Systems
Hacker, Martin
STAIRS 2014, 2014, 264 : 131 - 140
[42] Enhancing Speech Understanding in Spoken Dialogue Systems by Means of a New Frame-Correction Technique
Lopez-Cozar, Ramon
Callejas, Zoraida
Griol, David
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 750 - 753
[43] TOWARDS REDUCING THE NEED FOR SPEECH TRAINING DATA TO BUILD SPOKEN LANGUAGE UNDERSTANDING SYSTEMS
Thomas, Samuel
Kuo, Hong-Kwang J.
Kingsbury, Brian
Saon, George
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7932 - 7936
[44] HANDS-FREE SPEECH RECOGNITION CHALLENGE FOR REAL-WORLD SPEECH DIALOGUE SYSTEMS
Saruwatari, Hiroshi
Kawanami, Hiromichi
Takeuchi, Shota
Takahashi, Yu
Cincarek, Tobias
Shikano, Kiyohiro
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3729 - 3732
[45] Efficient language model development for spoken dialogue recognition and its evaluation on operator's speech at call centers
Miki, Kiyokazu
Hatazaki, Kaichiro
Hattori, Hiroaki
PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 80 - 86
[46] Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes
Meyer, Bernd T.
Brand, Thomas
Kollmeier, Birger
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 129 (01): : 388 - 403
[47] The Effect of Postlexical Deletion on Automatic Speech Recognition in Fast Spontaneously Spoken Zulu
van der Westhuizen, Ewald
Niesler, Thomas
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3559 - 3563
[48] Towards robust spoken dialogue systems using large-scale in-car speech corpus
Yamaguchi, Yukiko
Hayashi, Keita
Ono, Takahiro
Kato, Shingo
Irie, Yuki
Ohno, Tomohiro
Murao, Hiroya
Matsubara, Shigeki
Kawaguchi, Nobuo
Takeda, Kazuya
ADVANCES FOR IN-VEHICLE AND MOBILE SYSTEMS: CHALLENGES FOR INTERNATIONAL STANDARDS, 2007, : 211 - 222
[49] Optimizing Fuzzy Inference Systems for Improving Speech Emotion Recognition
Elbarougy, Reda
Akagi, Masato
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 85 - 95
[50] The Combination of CMS with PMC for Improving Robustness of Speech Recognition Systems
Veisi, Hadi
Sameti, Hossein
ADVANCES IN COMPUTER SCIENCE AND ENGINEERING, 2008, 6 : 825 - 829

← 1 2 3 4 5 →