Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts

被引:0
|
作者
Nguyen, Huy [1 ]
Nguyen, TrungTin [2 ,3 ]
Nguyen, Khai [1 ]
Ho, Nhat [1 ]
机构
[1] Univ Texas Austin, Dept Stat & Data Sci, Austin, TX 78712 USA
[2] Univ Queensland, Sch Math & Phys, Brisbane, Qld, Australia
[3] Univ Grenoble Alpes, Inria, LJK, Grenoble INP,CNRS, F-38000 Grenoble, France
关键词
MAXIMUM-LIKELIHOOD; OF-EXPERTS; HIERARCHICAL MIXTURES; FEATURE-SELECTION; IDENTIFIABILITY; REGRESSION; MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Originally introduced as a neural network for ensemble learning, mixture of experts (MoE) has recently become a fundamental building block of highly successful modern deep neural networks for heterogeneous data analysis in several applications of machine learning and statistics. Despite its popularity in practice, a satisfactory level of theoretical understanding of the MoE model is far from complete. To shed new light on this problem, we provide a convergence analysis for maximum likelihood estimation (MLE) in the Gaussian-gated MoE model. The main challenge of that analysis comes from the inclusion of covariates in the Gaussian gating functions and expert networks, which leads to their intrinsic interaction via some partial differential equations with respect to their parameters. We tackle these issues by designing novel Voronoi loss functions among parameters to accurately capture the heterogeneity of parameter estimation rates. Our findings reveal that the MLE has distinct behaviors under two complement settings of location parameters of the Gaussian gating functions, namely when all these parameters are non-zero versus when at least one among them vanishes. Notably, these behaviors can be characterized by the solvability of two different systems of polynomial equations. Finally, we conduct a simulation study to empirically verify our theoretical results.
引用
收藏
页数:33
相关论文
共 50 条
  • [31] ML-DC ALGORITHM OF PARAMETER ESTIMATION FOR GAUSSIAN MIXTURE AUTOREGRESSIVE MODEL
    Liu Feng
    Wang Pingbo
    Ma Chaoyang
    Hong Lixue
    2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 2, 2012, : 215 - 221
  • [32] Distributed Adaptive LMF Algorithm for Sparse Parameter Estimation in Gaussian Mixture Noise
    Hajiabadi, Mojtaba
    Zamiri-Jafarian, Hossein
    2014 7th International Symposium on Telecommunications (IST), 2014, : 1046 - 1049
  • [33] The baum-welch algorithm for parameter estimation of gaussian autoregressive mixture models
    Benesch T.
    Journal of Mathematical Sciences, 2001, 105 (6) : 2515 - 2518
  • [34] Genetic algorithm and expectation maximization for parameter estimation of mixture Gaussian model phantom
    Nasab, NM
    Analoui, M
    MEDICAL IMAGING 2002: IMAGE PROCESSING, VOL 1-3, 2002, 4684 : 864 - 871
  • [35] Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models
    Manole, Tudor
    Ho, Nhat
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [36] Parameter Estimation in Gaussian Mixture Models with Malicious Noise, without Balanced Mixing Coefficients
    Xu, Jing
    Marecek, Jakub
    2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 446 - 453
  • [37] Gaussian mixture parameter estimation with known means and unknown class-dependent variances
    Dattatreya, GR
    PATTERN RECOGNITION, 2002, 35 (07) : 1611 - 1616
  • [38] A Novel Framework for Parameter and State Estimation of Multicellular Systems Using Gaussian Mixture Approximations
    Duerr, Robert
    Waldherr, Steffen
    PROCESSES, 2018, 6 (10):
  • [39] Parameter Estimation for von Mises-Fisher Mixture Model via Gaussian Distribution
    Yasutomi, Suguru
    Tanaka, Toshihisa
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [40] Rainfall-Rate Estimation Using Gaussian Mixture Parameter Estimator: Training and Validation
    Li, Zhengzheng
    Zhang, Yan
    Giangrande, Scott E.
    JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY, 2012, 29 (05) : 731 - 744