Research

The marginal value of adaptive gradient methods in machine learning

Adaptive optimisation methods, which perform local optimisation with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We show that for simple overparameterized problems, adaptive methods often find drastically different solutions than gradient descent (GD) or stochastic gradient descent (SGD). We construct an illustrative binary classification problem where the data is linearly separable, GD and SGD achieve zero test error, and AdaGrad, Adam, and RMSProp attain test errors arbitrarily close to half. We additionally study the empirical generalisation capability of adaptive methods on several state-of-the-art deep learning models. We observe that the solutions found by adaptive methods generalise worse (often significantly worse) than SGD, even when these solutions have better training performance. These results suggest that practitioners should reconsider the use of adaptive methods to train neural networks.

Details

author(s)

Ashia Wilson

publication date

23 May 2017

source

Advances in Neural Information Processing Systems

related programme

MIT Jameel Clinic

Link to publication

External link ->

Generative AI in the era of 'alternative facts'

27 March 2024

MIT Open Publishing Services

External data and AI are making each other more valuable

26 February 2024

Harvard Business Review Press

Removing biases from molecular representations via information maximisation

1 December 2023

Arxiv

Effective human-AI teams via learned natural language rules and onboarding

7 November 2023

Arxiv

A deep dive into single-cell RNA sequencing foundation models

23 October 2023

bioRxiv

Antibiotic identified by AI

11 October 2023

Nature

LLM-grounded video diffusion models

2 October 2023

Arxiv

Successful Development of a Natural Language Processing Algorithm for Pancreatic Neoplasms and Associated Histologic Features

14 September 2023

Pancreas

Leveraging artificial intelligence in the fight against infectious diseases

13 July 2023

Science

BioAutoMATED: An end-to-end automated machine learning tool for explanation and design of biological sequences

21 June 2023

Cell Systems

Conformal language modeling

16 June 2023

Arxiv

Comparison of mammography AI algorithms with a clinical risk model for 5-year breast cancer risk prediction: An observational study

6 June 2023

Radiological Society of North America

Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii

25 May 2023

Nature

Algorithmic pluralism: A structural approach towards equal opportunity

14 May 2023

Arxiv

Artificial intelligence and machine learning in lung cancer screening

12 May 2023

Science Direct

Wide and deep neural networks achieve consistency for classification

30 March 2023

PNAS

Autocatalytic base editing for RNA-responsive translational control

11 March 2023

Nature

DiffDock: Diffusion steps, twists and turns for molecular docking

11 February 2023

Arxiv

Sybil: A Validated Deep Learning Model to Predict Future Lung Cancer Risk From a Single Low-Dose Chest Computed Tomography

12 January 2023

Journal of Clinical Oncology

Sequential multi-dimensional self-supervised learning for clinical time series

1 January 2023

Proceedings of Machine Learning Research

Queueing theory: Classical and modern methods

1 January 2022

Dynamic Ideas

Toward robust mammography-based models for breast cancer risk

26 January 2021

Science

The age of AI: And our human future

1 January 2021

Little, Brown and Company

Uniform priors for data-efficient transfer

13 October 2020

Arxiv

Machine learning under a modern optimisation lens

1 January 2019

Dynamic Ideas