Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

Jain, Umangi; Ramaswamy, Harish G.

Computer Science > Machine Learning

arXiv:2111.13075 (cs)

[Submitted on 25 Nov 2021]

Title:Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

Authors:Umangi Jain, Harish G. Ramaswamy

View PDF

Abstract:Despite their massive success, training successful deep neural networks still largely relies on experimentally choosing an architecture, hyper-parameters, initialization, and training mechanism. In this work, we focus on determining the success of standard gradient descent method for training deep neural networks on a specified dataset, architecture, and initialization (DAI) combination. Through extensive systematic experiments, we show that the evolution of singular values of the matrix obtained from the hidden layers of a DNN can aid in determining the success of gradient descent technique to train a DAI, even in the absence of validation labels in the supervised learning paradigm. This phenomenon can facilitate early give-up, stopping the training of neural networks which are predicted to not generalize well, early in the training process. Our experimentation across multiple datasets, architectures, and initializations reveals that the proposed scores can more accurately predict the success of a DAI than simply relying on the validation accuracy at earlier epochs to make a judgment.

Comments:	10 pages, 9 figures
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2111.13075 [cs.LG]
	(or arXiv:2111.13075v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2111.13075

Submission history

From: Umangi Jain [view email]
[v1] Thu, 25 Nov 2021 13:27:39 UTC (617 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-11

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Harish G. Ramaswamy

export BibTeX citation

Computer Science > Machine Learning

Title:Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators