Optimization Theory for Machine Learning

Dive into the mathematical underpinnings of optimization algorithms in machine learning. This series explores core theory, algorithm design, and connections to diverse mathematical fields, moving from fundamental concepts to advanced topics.

Preface

Preliminary information for the blog post series on machine learning and optimization.

May 18, 2025

Introduction to Basic Mathematical Optimization

An introduction to the core ideas of mathematical optimization and their indispensable role in machine learning. We use linear regression to build intuition ...

May 18, 2025

Iterative Methods: Gradient-Free vs. Gradient-Based Optimization

An introduction to iterative optimization methods, differentiating between gradient-free and gradient-based approaches, their principles, pros, cons, and app...

May 25, 2025

Desirable Properties of Optimizers

A discussion of the key desirable properties for optimization algorithms in machine learning, covering effectiveness, efficiency, robustness, invariance, pra...

May 25, 2025

Speedrun of Common Gradient-Based ML Optimizers

A quick tour of popular gradient-based optimization algorithms in machine learning, detailing their mechanics and empirical performance characteristics.

May 20, 2025

Problem Formalization - First Principles and Modern Perspectives in ML Optimization

Establishing the mathematical framework for machine learning optimization, integrating foundational principles with modern generalization theory to define th...

Jun 1, 2025

Gradient Descent and Gradient Flow

Exploring the mathematical foundations of gradient descent, its continuous analogue gradient flow, and their connections.

Jun 1, 2025

Challenges of High-Dimensional Non-Convex Optimization in Deep Learning

Analyzing why non-convex, high-dimensional loss landscapes in deep learning defy classical optimization intuition yet remain optimizable.

May 31, 2025

Stochastic Gradient Descent: Noise as a Design Feature

How SGD's inherent randomness creates implicit regularization, escapes local minima, and shapes generalization - setting the stage for soft inductive biases.

May 31, 2025

Soft Inductive Biases: Improving Generalization

A deep dive into soft inductive biases, focusing on how regularization techniques and optimization dynamics guide machine learning models, particularly in de...

Jun 1, 2025

Adaptive Methods and Preconditioning: Reshaping the Landscape

Exploring how adaptive methods and preconditioning reshape optimization problems for faster convergence, from classical techniques to matrix-free innovations...

Jun 1, 2025

Momentum: Second-Order Gradient Flow ODE and a Multi-Step Method

Delving into the momentum optimization method, understanding its origins from both second-order 'heavy ball' dynamics and as a linear multi-step method for f...

May 18, 2025

Adam Through the Lens of Information Geometry: A Diagonal Fisher Approximation

Exploring Adam as an approximation to natural gradient descent using the diagonal empirical Fisher Information Matrix, and improvements proposed by FAdam.

May 22, 2025

Adam Optimizer: Online Learning of Updates and Efficacy with EMA

Exploring novel theoretical understandings of the Adam optimizer through the lens of Follow-The-Regularized-Leader for its updates, and the provable benefits...

May 25, 2025

Metrized Deep Learning: Muon

Exploring how choosing the right norm for parameter spaces (like dimension-agnostic operator norms) and principled preconditioning (like PolarGrad) can revol...

Jun 6, 2025

Parameter-Free Optimization: Letting Algorithms Tune Themselves

Exploring optimization algorithms that dynamically adapt to the problem's structure, reducing or eliminating the need for manual hyperparameter tuning.

Jun 5, 2025

Optimization Theory for Machine Learning

Preface

Introduction to Basic Mathematical Optimization

Iterative Methods: Gradient-Free vs. Gradient-Based Optimization

Desirable Properties of Optimizers

Speedrun of Common Gradient-Based ML Optimizers

Problem Formalization - First Principles and Modern Perspectives in ML Optimization

Gradient Descent and Gradient Flow

Challenges of High-Dimensional Non-Convex Optimization in Deep Learning

Stochastic Gradient Descent: Noise as a Design Feature

Soft Inductive Biases: Improving Generalization

Adaptive Methods and Preconditioning: Reshaping the Landscape

Momentum: Second-Order Gradient Flow ODE and a Multi-Step Method

Adam Through the Lens of Information Geometry: A Diagonal Fisher Approximation

Adam Optimizer: Online Learning of Updates and Efficacy with EMA

Metrized Deep Learning: Muon

Parameter-Free Optimization: Letting Algorithms Tune Themselves

Trending Tags