Scalable Factorized Hierarchical Variational Autoencoder Training

被引:8
|
作者
Hsu, Wei-Ning [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
unsupervised learning; speech representation learning; factorized hierarchical variational autoencoder;
D O I
10.21437/Interspeech.2018-1034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations. Among them, a factorized hierarchical variational autoencoder (FHVAE) is a variational inference-based model that formulates a hierarchical generative process for sequential data. Specifically, an FHVAE model can learn disentangled and interpretable representations, which have been proven useful for numerous speech applications. such as speaker verification, robust speech recognition, and voice conversion. However, as we will elaborate in this paper, the training algorithm proposed in the original paper is not scalable to datasets of thousands of hours, which makes this model less applicable on a larger scale. After identifying limitations in terms of runtime, memory, and hyperparameter optimization, we propose a hierarchical sampling training algorithm to address all three issues. Our proposed method is evaluated comprehensively on a wide variety of datasets, ranging from 3 to 1,000 hours and involving different types of generating factors. such as recording conditions and noise types. In addition, we also present a new visualization method for qualitatively evaluating the performance with respect to the interpretability and disentanglement. Models trained with our proposed algorithm demonstrate the desired characteristics on all the datasets.
引用
收藏
页码:1462 / 1466
页数:5
相关论文
共 50 条
  • [11] Unleashing the Power of Hierarchical Variational Autoencoder for Predicting Breast Cancer
    Sreelekshmi, V.
    Pavithran, K.
    Nair, Jyothisha J.
    IEEE ACCESS, 2024, 12 : 195658 - 195670
  • [12] Banner layout retargeting with hierarchical reinforcement learning and variational autoencoder
    Hao Hu
    Chao Zhang
    Yanxue Liang
    Multimedia Tools and Applications, 2022, 81 : 34417 - 34438
  • [13] Hierarchical Constrained Variational Autoencoder for interaction-sparse recommendations
    Li, Nuo
    Guo, Bin
    Liu, Yan
    Ding, Yasan
    Yao, Lina
    Fan, Xiaopeng
    Yu, Zhiwen
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (03)
  • [14] Hierarchical Conditional Variational Autoencoder Based Acoustic Anomaly Detection
    Purohit, Harsh
    Endo, Takashi
    Yamamoto, Masaaki
    Kawaguchi, Yohei
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 274 - 278
  • [15] Banner layout retargeting with hierarchical reinforcement learning and variational autoencoder
    Hu, Hao
    Zhang, Chao
    Liang, Yanxue
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34417 - 34438
  • [16] FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion
    Duffhauss, Fabian
    Ngo Anh Vien
    Ziesche, Hanna
    Neumann, Gerhard
    COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 674 - 691
  • [17] A Contrastive Learning Approach for Training Variational Autoencoder Priors
    Aneja, Jyoti
    Schwing, Alexander G.
    Kautz, Jan
    Vahdat, Arash
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [18] Scalable Variational Quantum Circuits for Autoencoder-based Drug Discovery
    Li, Junde
    Ghosh, Swaroop
    PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 340 - 345
  • [19] Scalable Training of Hierarchical Topic Models
    Chen, Jianfei
    Zhu, Jun
    Lu, Jie
    Liu, Shixia
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (07): : 826 - 839
  • [20] Reliable and Scalable Variational Inference for the Hierarchical Dirichlet Process
    Hughes, Michael C.
    Kim, Dae Il
    Sudderth, Erik B.
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 370 - 378