It's Raw! Audio Generation with State-Space Models

被引:0
|
作者
Goel, Karan [1 ]
Gu, Albert [1 ]
Donahue, Chris [1 ]
Re, Christopher [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Developing architectures suitable for modeling raw audio is a challenging problem due to the high sampling rates of audio waveforms. Standard sequence modeling approaches like RNNs and CNNs have previously been tailored to fit the demands of audio, but the resultant architectures make undesirable computational tradeoffs and struggle to model waveforms effectively. We propose SASHIMI, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling. We identify that S4 can be unstable during autoregressive generation, and provide a simple improvement to its parameterization by drawing connections to Hurwitz matrices. SASHIMI yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting. Additionally, SASHIMI improves non-autoregressive generation performance when used as the backbone architecture for a diffusion model. Compared to prior architectures in the autoregressive generation setting, SASHIMI generates piano and speech waveforms which humans find more musical and coherent respectively, e.g. 2x better mean opinion scores than WaveNet on an unconditional speech generation task.1 On a music generation task, SASHIMI outperforms WaveNet on density estimation and speed at both training and inference even when using 3x fewer parameters.
引用
收藏
页数:18
相关论文
共 50 条
  • [11] Bootstrapping Periodic State-Space Models
    Guerbyenne, Hafida
    Hamdi, Faycal
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2015, 44 (02) : 374 - 401
  • [12] Inequality Constrained State-Space Models
    Qian, Hang
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2019, 37 (02) : 350 - 362
  • [13] DISTURBANCE SMOOTHER FOR STATE-SPACE MODELS
    KOOPMAN, SJ
    BIOMETRIKA, 1993, 80 (01) : 117 - 126
  • [14] Probabilistic Recurrent State-Space Models
    Doerr, Andreas
    Daniel, Christian
    Schiegg, Martin
    Nguyen-Tuong, Duy
    Schaal, Stefan
    Toussaint, Marc
    Trimpe, Sebastian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [15] State-space estimation with uncertain models
    Sayed, AH
    Subramanian, A
    TOTAL LEAST SQUARES AND ERRORS-IN-VARIABLES MODELING: ANALYSIS, ALGORITHMS AND APPLICATIONS, 2002, : 191 - 202
  • [16] Identification of structured state-space models
    Yu, Chengpu
    Ljung, Lennart
    Verhaegen, Michel
    AUTOMATICA, 2018, 90 : 54 - 61
  • [17] Approximate Methods for State-Space Models
    Koyama, Shinsuke
    Perez-Bolde, Lucia Castellanos
    Shalizi, Cosma Rohilla
    Kass, Robert E.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (489) : 170 - 180
  • [18] State-space models for control and identification
    Raynaud, HF
    Kulcsár, C
    Hammi, R
    ADVANCES IN COMMUNICATION CONTROL NETWORKS, 2005, 308 : 177 - 197
  • [19] Smoothing algorithms for state-space models
    Briers, Mark
    Doucet, Arnaud
    Maskell, Simon
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2010, 62 (01) : 61 - 89
  • [20] STUBBORN SETS FOR REDUCED STATE-SPACE GENERATION
    VALMARI, A
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 483 : 491 - 515