Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

被引:13
|
作者
Asif, Umar [1 ]
Tang, Jianbin [1 ]
Harrer, Stefan [1 ]
机构
[1] IBM Res Australia, Southbank, Vic, Australia
关键词
D O I
10.3233/FAIA200188
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensemble models comprising of deep Convolutional Neural Networks (CNN) have shown significant improvements in model generalization but at the cost of large computation and memory requirements. In this paper, we present a framework for learning compact CNN models with improved classification performance and model generalization. For this, we propose a CNN architecture of a compact student model with parallel branches which are trained using ground truth labels and information from high capacity teacher networks in an ensemble learning fashion. Our framework provides two main benefits: i) Distilling knowledge from different teachers into the student network promotes heterogeneity in learning features at different branches of the student network and enables the network to learn diverse solutions to the target problem. ii) Coupling the branches of the student network through ensembling encourages collaboration and improves the quality of the final predictions by reducing variance in the network outputs. Experiments on the well established CIFAR-10 and CIFAR-100 datasets show that our Ensemble Knowledge Distillation (EKD) improves classification accuracy and model generalization especially in situations with limited training data. Experiments also show that our EKD based compact networks outperform in terms of mean accuracy on the test datasets compared to other knowledge distillation based methods.
引用
收藏
页码:953 / 960
页数:8
相关论文
共 50 条
  • [1] Efficient Knowledge Distillation from an Ensemble of Teachers
    Fukuda, Takashi
    Suzuki, Masayuki
    Kurata, Gakuto
    Thomas, Samuel
    Cui, Jia
    Ramabhadran, Bhuvana
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3697 - 3701
  • [2] Communication-efficient Federated Learning for UAV Networks with Knowledge Distillation and Transfer Learning
    Li, Yalong
    Wu, Celimuge
    Du, Zhaoyang
    Zhong, Lei
    Yoshinaga, Tsutomu
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 5739 - 5744
  • [3] "In-Network Ensemble": Deep Ensemble Learning with Diversified Knowledge Distillation
    Li, Xingjian
    Xiong, Haoyi
    Chen, Zeyu
    Huan, Jun
    Xu, Cheng-Zhong
    Dou, Dejing
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (05)
  • [4] Periodic Intra-ensemble Knowledge Distillation for Reinforcement Learning
    Hong, Zhang-Wei
    Nagarajan, Prabhat
    Maeda, Guilherme
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 87 - 103
  • [5] Adaptive ensemble learning for efficient keyphrase extraction: Diagnosis, and distillation
    Zhang, Kai
    Gang, Hongbo
    Hu, Feng
    Yu, Runlong
    Liu, Qi
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 278
  • [6] Feature-Level Ensemble Knowledge Distillation for Aggregating Knowledge from Multiple Networks
    Park, SeongUk
    Kwak, Nojun
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1411 - 1418
  • [7] Improved knowledge distillation method with curriculum learning paradigm
    Zhang S.
    Wang C.
    Yang K.
    Luo X.
    Wu C.
    Li Q.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2022, 28 (07): : 2075 - 2082
  • [8] An improved ensemble machine learning classifier for efficient spectrum sensing in cognitive radio networks
    Sivagurunathan, P. T.
    Sathishkumar, S.
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2024, 37 (02)
  • [9] An efficient welding state monitoring model for robotic welding based on ensemble learning and generative adversarial knowledge distillation
    Xiao, Runquan
    Zhu, Kanghong
    Liu, Qiang
    Chen, Huabin
    Chen, Shanben
    MEASUREMENT, 2025, 242
  • [10] Learning Efficient Object Detection Models with Knowledge Distillation
    Chen, Guobin
    Choi, Wongun
    Yu, Xiang
    Han, Tony
    Chandraker, Manmohan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30