Mixture of experts: a literature survey

被引:237
|
作者
Masoudnia, Saeed [1 ]
Ebrahimpour, Reza [2 ]
机构
[1] Univ Tehran, Sch Math Stat & Comp Sci, Tehran, Iran
[2] Shahid Rajaee Teacher Training Univ, Dept Elect & Comp Engn, Brain & Intelligent Syst Res Lab, Tehran, Iran
关键词
Classifier combining; Mixture of experts; Mixture of implicitly localised experts; Mixture of explicitly localised expert; INDEPENDENT FACE RECOGNITION; NETWORK STRUCTURE; ENSEMBLE METHODS; MACHINE; CLASSIFICATION; CLASSIFIERS; ALGORITHM; MODEL;
D O I
10.1007/s10462-012-9338-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mixture of experts (ME) is one of the most popular and interesting combining methods, which has great potential to improve performance in machine learning. ME is established based on the divide-and-conquer principle in which the problem space is divided between a few neural network experts, supervised by a gating network. In earlier works on ME, different strategies were developed to divide the problem space between the experts. To survey and analyse these methods more clearly, we present a categorisation of the ME literature based on this difference. Various ME implementations were classified into two groups, according to the partitioning strategies used and both how and when the gating network is involved in the partitioning and combining procedures. In the first group, The conventional ME and the extensions of this method stochastically partition the problem space into a number of subspaces using a special employed error function, and experts become specialised in each subspace. In the second group, the problem space is explicitly partitioned by the clustering method before the experts' training process starts, and each expert is then assigned to one of these sub-spaces. Based on the implicit problem space partitioning using a tacit competitive process between the experts, we call the first group the mixture of implicitly localised experts (MILE), and the second group is called mixture of explicitly localised experts (MELE), as it uses pre-specified clusters. The properties of both groups are investigated in comparison with each other. Investigation of MILE versus MELE, discussing the advantages and disadvantages of each group, showed that the two approaches have complementary features. Moreover, the features of the ME method are compared with other popular combining methods, including boosting and negative correlation learning methods. As the investigated methods have complementary strengths and limitations, previous researches that attempted to combine their features in integrated approaches are reviewed and, moreover, some suggestions are proposed for future research directions.
引用
收藏
页码:275 / 293
页数:19
相关论文
共 50 条
  • [31] A mixture of experts for classifying sleep apneas
    Guijarro-Berdinas, Bertha
    Hernandez-Pereira, Elena
    Peteiro-Barral, Diego
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) : 7084 - 7092
  • [32] On the Representation Collapse of Sparse Mixture of Experts
    Chi, Zewen
    Dong, Li
    Huang, Shaohan
    Dai, Damai
    Ma, Shuming
    Patra, Barun
    Singhal, Saksham
    Bajaj, Payal
    Song, Xia
    Mao, Xian-Ling
    Huang, Heyan
    Wei, Furu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [33] Mixture of experts for stellar data classification
    Jiang, YG
    Guo, P
    ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS, 2005, 3497 : 310 - 315
  • [34] HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
    Zhao, Hao
    Qiu, Zihan
    Wu, Huijia
    Wang, Zili
    He, Zhaofeng
    Fu, Jie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 10605 - 10618
  • [35] Prior Distribution Selection for a Mixture of Experts
    Grabovoy, A., V
    Strijov, V. V.
    COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2021, 61 (07) : 1140 - 1152
  • [36] Robust learning algorithm for the mixture of experts
    Allende, H
    Torres, R
    Salas, R
    Moraga, C
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PROCEEDINGS, 2003, 2652 : 19 - 27
  • [37] MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts
    Xie, Zhitian
    Zhang, Yinger
    Zhuang, Chenyi
    Shi, Qitao
    Liu, Zhining
    Gu, Jinjie
    Zhang, Guannan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16067 - 16075
  • [38] Asymptotic properties of mixture-of-experts models
    Olteanu, M.
    Rynkiewicz, J.
    NEUROCOMPUTING, 2011, 74 (09) : 1444 - 1449
  • [39] Regularization and error bars for the mixture of experts network
    Ramamurti, V
    Ghosh, J
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 221 - 225
  • [40] Evaluation of cluster combination functions for mixture of experts
    Redhead, R
    Heywood, M
    Proceedings of the International Joint Conference on Neural Networks (IJCNN), Vols 1-5, 2005, : 1154 - 1159