Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss

被引:0
|
作者
Brechet, Pierre [1 ]
Papagiannouli, Katerina [1 ]
An, Jing [2 ]
Montufar, Guido [1 ,3 ,4 ]
机构
[1] Max Planck Inst Math Sci, Leipzig, Germany
[2] Duke Univ, Dept Math, Durham, NC USA
[3] UCLA, Dept Math, Los Angeles, CA USA
[4] UCLA, Dept Stat, Los Angeles, CA USA
基金
欧洲研究理事会;
关键词
LOW-RANK APPROXIMATION; MATRIX;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another type of loss and connects with the generative setting. We characterize the critical points and minimizers of the Bures-Wasserstein distance over the space of rankbounded matrices. The Hessian of this loss at lowrank matrices can theoretically blow up, which creates challenges to analyze convergence of gradient optimization methods. We establish convergence results for gradient flow using a smooth perturbative version of the loss as well as convergence results for finite step size gradient descent under certain assumptions on the initial weights.
引用
收藏
页数:42
相关论文
共 6 条
  • [1] A Distributed Conditional Wasserstein Deep Convolutional Relativistic Loss Generative Adversarial Network with Improved Convergence
    Roy A.
    Dasgupta D.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (09): : 1 - 10
  • [2] On Non-local Convergence Analysis of Deep Linear Networks
    Chen, Kun
    Lin, Dachao
    Zhang, Zhihua
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks
    Qin, Zhen
    Tan, Xuwei
    Zhu, Zhihui
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 795 - 799
  • [4] The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis
    Achour, El Mehdi
    Malgouyres, Francois
    Gerchinovitz, Sebastien
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 76
  • [5] A convergence analysis of Nesterov's accelerated gradient method in training deep linear neural networks
    Liu, Xin
    Tao, Wei
    Pan, Zhisong
    INFORMATION SCIENCES, 2022, 612 : 898 - 925
  • [6] Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks
    Shin, Yeonjong
    ANALYSIS AND APPLICATIONS, 2022, 20 (01) : 73 - 119