Docs

Research

The marginal value of adaptive gradient methods in machine learning

Adaptive optimisation methods, which perform local optimisation with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We show that for simple overparameterized problems, adaptive methods often find drastically different solutions than gradient descent (GD) or stochastic gradient descent (SGD). We construct an illustrative binary classification problem where the data is linearly separable, GD and SGD achieve zero test error, and AdaGrad, Adam, and RMSProp attain test errors arbitrarily close to half. We additionally study the empirical generalisation capability of adaptive methods on several state-of-the-art deep learning models. We observe that the solutions found by adaptive methods generalise worse (often significantly worse) than SGD, even when these solutions have better training performance. These results suggest that practitioners should reconsider the use of adaptive methods to train neural networks.

Details

author(s)
Ashia Wilson
publication date
23 May 2017
source
Advances in Neural Information Processing Systems
related programme
MIT Jameel Clinic
Link to publication
External link ->

Generative AI in the era of 'alternative facts'

|

MIT Open Publishing Services

External data and AI are making each other more valuable

|

Harvard Business Review Press

Removing biases from molecular representations via information maximisation

|

Arxiv

Effective human-AI teams via learned natural language rules and onboarding

|

Arxiv

A deep dive into single-cell RNA sequencing foundation models

|

bioRxiv

Antibiotic identified by AI

|

Nature

LLM-grounded video diffusion models

|

Arxiv

Successful Development of a Natural Language Processing Algorithm for Pancreatic Neoplasms and Associated Histologic Features

|

Pancreas

Leveraging artificial intelligence in the fight against infectious diseases

|

Science

BioAutoMATED: An end-to-end automated machine learning tool for explanation and design of biological sequences

|

Cell Systems

Conformal language modeling

|

Arxiv

Comparison of mammography AI algorithms with a clinical risk model for 5-year breast cancer risk prediction: An observational study

|

Radiological Society of North America

Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii

|

Nature

Algorithmic pluralism: A structural approach towards equal opportunity

|

Arxiv

Artificial intelligence and machine learning in lung cancer screening

|

Science Direct

Wide and deep neural networks achieve consistency for classification

|

PNAS

Autocatalytic base editing for RNA-responsive translational control

|

Nature

DiffDock: Diffusion steps, twists and turns for molecular docking

|

Arxiv

Sybil: A Validated Deep Learning Model to Predict Future Lung Cancer Risk From a Single Low-Dose Chest Computed Tomography

|

Journal of Clinical Oncology

Sequential multi-dimensional self-supervised learning for clinical time series

|

Proceedings of Machine Learning Research

Queueing theory: Classical and modern methods

|

Dynamic Ideas

Toward robust mammography-based models for breast cancer risk

|

Science

The age of AI: And our human future

|

Little, Brown and Company

Uniform priors for data-efficient transfer

|

Arxiv

Machine learning under a modern optimisation lens

|

Dynamic Ideas

The marginal value of adaptive gradient methods in machine learning

|

Advances in Neural Information Processing Systems

Efficient graph-based image segmentation

|

International Journal of Computer Vision

We use cookies on our site.