Tucat, M, Mukherjee, A, Sen, P, Sun, M and Rivasplata, O
(2025)
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Transactions on Machine Learning Research, 2025-J.
ISSN 2835-8856
Abstract
We present and analyze a novel regularized form of the gradient clipping algorithm, proving that it converges to global minima of the loss surface of deep neural networks under the squared loss, provided that the layers are of sufficient width. The algorithm presented here, dubbed δ−GClip, introduces a modification to gradient clipping that leads to a first-of-its-kind example of a step size scheduling for gradient descent that provably minimizes training losses of deep neural nets. We also present empirical evidence that our theoretically founded δ−GClip algorithm is competitive with the state-of-the-art deep learning heuristics on various neural architectures including modern transformer based architectures. The modification we do to standard gradient clipping is designed to leverage the PL* condition, a variant of the Polyak-Łojasiewicz inequality which was recently proven to be true for sufficiently wide neural networks at any depth within a neighbourhood of the initialization.
| Item Type: | Article |
|---|---|
| Divisions: | Faculty of Science & Engineering Faculty of Science & Engineering > School of Computer Science & Informatics Faculty of Science & Engineering > School of Computer Science & Informatics > Artificial Intelligence |
| Depositing User: | Symplectic Admin |
| Date Deposited: | 17 Oct 2025 09:50 |
| Last Modified: | 17 Oct 2025 09:50 |
| Open Access URL: | https://arxiv.org/abs/2404.08624 |
| URI: | https://livrepository.liverpool.ac.uk/id/eprint/3194864 |
| Disclaimer: | The University of Liverpool is not responsible for content contained on other websites from links within repository metadata. Please contact us if you notice anything that appears incorrect or inappropriate. |
CORE (COnnecting REpositories)
CORE (COnnecting REpositories)