It's Raw! Audio Generation with State-Space Models

被引：0

作者：

Goel, Karan ^{[1
]}

Gu, Albert ^{[1
]}

Donahue, Chris ^{[1
]}

Re, Christopher ^{[1
]}

机构：

[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Developing architectures suitable for modeling raw audio is a challenging problem due to the high sampling rates of audio waveforms. Standard sequence modeling approaches like RNNs and CNNs have previously been tailored to fit the demands of audio, but the resultant architectures make undesirable computational tradeoffs and struggle to model waveforms effectively. We propose SASHIMI, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling. We identify that S4 can be unstable during autoregressive generation, and provide a simple improvement to its parameterization by drawing connections to Hurwitz matrices. SASHIMI yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting. Additionally, SASHIMI improves non-autoregressive generation performance when used as the backbone architecture for a diffusion model. Compared to prior architectures in the autoregressive generation setting, SASHIMI generates piano and speech waveforms which humans find more musical and coherent respectively, e.g. 2x better mean opinion scores than WaveNet on an unconditional speech generation task.1 On a music generation task, SASHIMI outperforms WaveNet on density estimation and speed at both training and inference even when using 3x fewer parameters.

引用

页数：18

共 50 条

[11] Bootstrapping Periodic State-Space Models
Guerbyenne, Hafida
Hamdi, Faycal
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2015, 44 (02) : 374 - 401
[12] Inequality Constrained State-Space Models
Qian, Hang
JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2019, 37 (02) : 350 - 362
[13] DISTURBANCE SMOOTHER FOR STATE-SPACE MODELS
KOOPMAN, SJ
BIOMETRIKA, 1993, 80 (01) : 117 - 126
[14] Probabilistic Recurrent State-Space Models
Doerr, Andreas
Daniel, Christian
Schiegg, Martin
Nguyen-Tuong, Duy
Schaal, Stefan
Toussaint, Marc
Trimpe, Sebastian
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[15] State-space estimation with uncertain models
Sayed, AH
Subramanian, A
TOTAL LEAST SQUARES AND ERRORS-IN-VARIABLES MODELING: ANALYSIS, ALGORITHMS AND APPLICATIONS, 2002, : 191 - 202
[16] Identification of structured state-space models
Yu, Chengpu
Ljung, Lennart
Verhaegen, Michel
AUTOMATICA, 2018, 90 : 54 - 61
[17] Approximate Methods for State-Space Models
Koyama, Shinsuke
Perez-Bolde, Lucia Castellanos
Shalizi, Cosma Rohilla
Kass, Robert E.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (489) : 170 - 180
[18] State-space models for control and identification
Raynaud, HF
Kulcsár, C
Hammi, R
ADVANCES IN COMMUNICATION CONTROL NETWORKS, 2005, 308 : 177 - 197
[19] Smoothing algorithms for state-space models
Briers, Mark
Doucet, Arnaud
Maskell, Simon
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2010, 62 (01) : 61 - 89
[20] STUBBORN SETS FOR REDUCED STATE-SPACE GENERATION
VALMARI, A
LECTURE NOTES IN COMPUTER SCIENCE, 1991, 483 : 491 - 515

← 1 2 3 4 5 →