Spatial Mixture-of-Experts

被引:0
|
作者
Dryden, Nikoli [1 ]
Hoefler, Torsten [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
基金
欧盟地平线“2020”;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many data have an underlying dependence on spatial location; it may be weather on the Earth, a simulation on a mesh, or a registered image. Yet this feature is rarely taken advantage of, and violates common assumptions made by many neural network layers, such as translation equivariance. Further, many works that do incorporate locality fail to capture fine-grained structure. To address this, we introduce the Spatial Mixture-of-Experts (SMOE) layer, a sparsely-gated layer that learns spatial structure in the input domain and routes experts at a fine-grained level to utilize it. We also develop new techniques to train SMOEs, including a self-supervised routing loss and damping expert errors. Finally, we show strong results for SMOEs on numerous tasks, and set new state-of-the-art results for medium-range weather prediction and post-processing ensemble weather forecasts.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Hierarchical mixture-of-experts models for count variables with excessive zeros
    Park, Myung Hyun
    Kim, Joseph H. T.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (12) : 4072 - 4096
  • [42] Distributed Mixture-of-Experts for Big Data using PETUUM framework
    Peralta, Billy
    Parra, Luis
    Herrera, Oriel
    Caro, Luis
    2017 36TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2017,
  • [43] Mixture-of-experts for handwriting trajectory reconstruction from IMU sensors
    Imbert, Florent
    Anquetil, Eric
    Soullard, Yann
    Tavenard, Romain
    PATTERN RECOGNITION, 2025, 161
  • [44] Extension of mixture-of-experts networks for binary classification of hierarchical data
    Ng, Shu-Kay
    McLachlan, Geoffrey J.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2007, 41 (01) : 57 - 67
  • [45] DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
    Jain, Yash
    Behl, Harkirat
    Kira, Zsolt
    Vineet, Vibhav
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
    Lu, Xudong
    Liu, Qi
    Xu, Yuhui
    Zhou, Aojun
    Huang, Siyuan
    Zhang, Bo
    Yan, Junchi
    Li, Hongsheng
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 6159 - 6172
  • [47] Steered Mixture-of-Experts for Light Field Images and Video: Representation and Coding
    Verhack, Ruben
    Sikora, Thomas
    Van Wallendael, Glenn
    Lambert, Peter
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (03) : 579 - 593
  • [48] Real-Time Scheduling of Mixture-of-Experts Systems with Limited Resources
    Rattanatamrong, Prapaporn
    Fortes, Jose A. B.
    HSSC 10: PROCEEDINGS OF THE 13TH ACM INTERNATIONAL CONFERENCE ON HYBRID SYSTEMS: COMPUTATION AND CONTROL, 2010, : 71 - 80
  • [49] Adaptive mixture-of-experts models for data glove interface with multiple users
    Yoon, Jong-Won
    Yang, Sung-Ihk
    Cho, Sung-Bae
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (05) : 4898 - 4907
  • [50] Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
    Park, Byeongjun
    Go, Hyojun
    Kim, Jin-Young
    Woo, Sangmin
    Ham, Seokil
    Kim, Changick
    COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 461 - 477