Cross-Modality Diffusion Modeling and Sampling for Speech Recognition

被引：0

作者：

Yeh, Chia-Kai ^{[1
]}

Chen, Chih-Chun ^{[1
]}

Hsu, Ching-Hsieh ^{[1
]}

Chien, Jen-Tzung ^{[1
]}

机构：

[1] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu, Taiwan

来源：

INTERSPEECH 2024 | 2024年

关键词：

speech recognition; diffusion model; feature decorrelation; fast sampling;

D O I：

10.21437/Interspeech.2024-1898

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The diffusion model excels as a generative model for continuous data within a single modality. To extend its effectiveness to speech recognition, where the continuous speech frames are used as the condition to generate the discrete word tokens, building a conditional diffusion across discrete state space becomes crucial. This paper introduces a non-autoregressive discrete diffusion model, enabling parallel generation of a word string corresponding to a speech signal through iterative diffusion steps. An acoustic transformer encoder identifies the speech representation, serving as the condition for a denoising transformer decoder to predict the whole discrete sequence. To address the redundancy reduction in cross-modality diffusion, an additional feature decorrelation objective is integrated during optimization. This paper further reduces the inference time by using a fast sampling approach. The experiments on speech recognition illustrate the merit of the proposed method.

引用

页码：3924 / 3928

页数：5

共 50 条

[31] DIETETIC SERVICES IN A CROSS-MODALITY SYSTEM
MODROW, CL
DARNELL, RE
JOURNAL OF THE AMERICAN DIETETIC ASSOCIATION, 1979, 74 (03) : 341 - 344
[32] Cross-modality Neuroimage Synthesis: A Survey
Xie, Guoyang
Huang, Yawen
Wang, Jinbao
Lyu, Jiayi
Zheng, Feng
Zheng, Yefeng
Jin, Yaochu
ACM COMPUTING SURVEYS, 2024, 56 (03)
[33] A Cross-Modality Perspective On Verb Agreement
Irit Meir
Natural Language & Linguistic Theory, 2002, 20 : 413 - 450
[34] CROSS-MODALITY TRANSFER OF SPATIAL INFORMATION
FISHBEIN, HD
DECKER, J
WILCOX, P
BRITISH JOURNAL OF PSYCHOLOGY, 1977, 68 (NOV) : 503 - 508
[35] Cross-Modality Wood Log Tracing
Wimmer, Georg
Schraml, Rudolf
Lamminger, Lukas
Petutschnigg, Alexander
Uhl, Andreas
23RD IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2021), 2021, : 191 - 195
[36] SUBJECT DIFFERENCES IN CROSS-MODALITY MATCHING
RULE, SJ
MARKLEY, RP
PERCEPTION & PSYCHOPHYSICS, 1971, 9 (1B): : 115 - &
[37] CROSS-MODALITY MATCHING OF BRIGHTNESS AND LOUDNESS
STEVENS, JC
MARKS, LE
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1965, 54 (02) : 407 - &
[38] Boosting Cross-Modality Image Registration
Barbu, Adrian
Ionasec, Razvan
2009 JOINT URBAN REMOTE SENSING EVENT, VOLS 1-3, 2009, : 89 - +
[39] CROSS-MODALITY MASKING FOR TOUCH AND HEARING
GESCHEIDER, GA
NIBLETTE, RK
JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1967, 74 (03): : 313 - +
[40] A cross-modality perspective on verb agreement
Meir, I
NATURAL LANGUAGE & LINGUISTIC THEORY, 2002, 20 (02) : 413 - 450

← 1 2 3 4 5 →