Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss

被引：0

作者：

Brechet, Pierre ^{[1
]}

Papagiannouli, Katerina ^{[1
]}

An, Jing ^{[2
]}

Montufar, Guido ^{[1
,3
,4
]}

机构：

[1] Max Planck Inst Math Sci, Leipzig, Germany

[2] Duke Univ, Dept Math, Durham, NC USA

[3] UCLA, Dept Math, Los Angeles, CA USA

[4] UCLA, Dept Stat, Los Angeles, CA USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202 | 2023年 / 202卷

基金：

欧洲研究理事会;

关键词：

LOW-RANK APPROXIMATION; MATRIX;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another type of loss and connects with the generative setting. We characterize the critical points and minimizers of the Bures-Wasserstein distance over the space of rankbounded matrices. The Hessian of this loss at lowrank matrices can theoretically blow up, which creates challenges to analyze convergence of gradient optimization methods. We establish convergence results for gradient flow using a smooth perturbative version of the loss as well as convergence results for finite step size gradient descent under certain assumptions on the initial weights.

引用

页数：42

共 6 条

[1] A Distributed Conditional Wasserstein Deep Convolutional Relativistic Loss Generative Adversarial Network with Improved Convergence
Roy A.
Dasgupta D.
IEEE Transactions on Artificial Intelligence, 2024, 5 (09): : 1 - 10
[2] On Non-local Convergence Analysis of Deep Linear Networks
Chen, Kun
Lin, Dachao
Zhang, Zhihua
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[3] Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks
Qin, Zhen
Tan, Xuwei
Zhu, Zhihui
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 795 - 799
[4] The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis
Achour, El Mehdi
Malgouyres, Francois
Gerchinovitz, Sebastien
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 76
[5] A convergence analysis of Nesterov's accelerated gradient method in training deep linear neural networks
Liu, Xin
Tao, Wei
Pan, Zhisong
INFORMATION SCIENCES, 2022, 612 : 898 - 925
[6] Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks
Shin, Yeonjong
ANALYSIS AND APPLICATIONS, 2022, 20 (01) : 73 - 119

← 1 →