Learning curves for the multi-class teacher-student perceptron

被引:6
|
作者
Cornacchia, Elisabetta [1 ]
Mignacco, Francesca [2 ,3 ,4 ]
Veiga, Rodrigo [5 ,6 ]
Gerbelot, Cedric [7 ]
Loureiro, Bruno [5 ,8 ,9 ]
Zdeborova, Lenka [10 ]
机构
[1] Ecole Polytech Fed Lausanne EPFL, Math Data Sci MDS lab, Lausanne, Switzerland
[2] Univ Paris Saclay, Inst Phys theor, CNRS, CEA, Saclay, France
[3] Princeton Univ, Princeton, NJ 08544 USA
[4] CUNY, New York, NY 10017 USA
[5] Ecole Polytech Fed Lausanne EPFL, Informat Learning & Phys IdePH lab, Lausanne, Switzerland
[6] Univ Sao Paulo, Inst Fis, Sao Paulo, Brazil
[7] CUNY, Courant Inst Math Sci, New York, NY USA
[8] Ecole Normale Super PSL & CNRS, Paris, France
[9] CNRS, Paris, France
[10] Ecole Polytech Fed Lausanne EPFL, Stat Phys Computat SPOC lab, Lausanne, Switzerland
来源
关键词
multi-class classification; empirical risk minimization; high-dimensional statistics; MESSAGE-PASSING ALGORITHMS; STATISTICAL-MECHANICS;
D O I
10.1088/2632-2153/acb428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with a single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal (BO) estimation and empirical risk minimisation (ERM) were extensively analysed in this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the multi-class teacher-student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for the BO and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for Rademacher teacher we show that a first-order phase transition arises in the BO performance.
引用
收藏
页数:35
相关论文
共 50 条
  • [1] Teacher-Student Learning for a Binary Perceptron with Quantum Fluctuations
    Arai, Shunta
    Ohzeki, Masayuki
    Tanaka, Kazuyuki
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2021, 90 (07)
  • [2] Teacher-Student Curriculum Learning
    Matiisen, Tambet
    Oliver, Avital
    Cohen, Taco
    Schulman, John
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (09) : 3732 - 3740
  • [3] CONDITIONAL TEACHER-STUDENT LEARNING
    Meng, Zhong
    Li, Jinyu
    Zhao, Yong
    Gong, Yifan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6445 - 6449
  • [4] Learning curves of generic features maps for realistic datasets with a teacher-student model
    Loureiro, Bruno
    Gerbelot, Cedric
    Cui, Hugo
    Goldt, Sebastian
    Krzakala, Florent
    Mezard, Marc
    Zdeborova, Lenka
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Learning curves of generic features maps for realistic datasets with a teacher-student model*
    Loureiro, Bruno
    Gerbelot, Cedric
    Cui, Hugo
    Goldt, Sebastian
    Krzakala, Florent
    Mezard, Marc
    Zdeborova, Lenka
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2022, 2022 (11):
  • [6] General Sequence Teacher-Student Learning
    Wong, Jeremy Heng Meng
    Gales, Mark John Francis
    Wan, Yu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1725 - 1736
  • [7] Lifelong Teacher-Student Network Learning
    Ye, Fei
    Bors, Adrian G.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6280 - 6296
  • [8] Evolution of multi-class single layer perceptron
    Raudys, Sarunas
    ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, PT 2, 2007, 4432 : 1 - 10
  • [9] Teacher-student relationships in class: a future for research
    Wubbels, T.
    PEDAGOGISCHE STUDIEN, 2014, 91 (05): : 352 - 363
  • [10] Asymptotic learning curves of kernel methods: empirical data versus teacher-student paradigm
    Spigler, Stefano
    Geiger, Mario
    Wyart, Matthieu
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2020, 2020 (12):