Preface
Preliminary information for the blog post series on machine learning and optimization.
Dive into the mathematical underpinnings of optimization algorithms in machine learning. This series explores core theory, algorithm design, and connections to diverse mathematical fields, moving from fundamental concepts to advanced topics.
Preliminary information for the blog post series on machine learning and optimization.
An introduction to the core ideas of mathematical optimization and their indispensable role in machine learning. We use linear regression to build intuition ...
An introduction to iterative optimization methods, differentiating between gradient-free and gradient-based approaches, their principles, pros, cons, and app...
A discussion of the key desirable properties for optimization algorithms in machine learning, covering effectiveness, efficiency, robustness, invariance, pra...
A quick tour of popular gradient-based optimization algorithms in machine learning, detailing their mechanics and empirical performance characteristics.
Establishing the mathematical framework for machine learning optimization, integrating foundational principles with modern generalization theory to define th...
Exploring the mathematical foundations of gradient descent, its continuous analogue gradient flow, and their connections.
Analyzing why non-convex, high-dimensional loss landscapes in deep learning defy classical optimization intuition yet remain optimizable.
How SGD's inherent randomness creates implicit regularization, escapes local minima, and shapes generalization - setting the stage for soft inductive biases.
A deep dive into soft inductive biases, focusing on how regularization techniques and optimization dynamics guide machine learning models, particularly in de...
Exploring how adaptive methods and preconditioning reshape optimization problems for faster convergence, from classical techniques to matrix-free innovations...
Delving into the momentum optimization method, understanding its origins from both second-order 'heavy ball' dynamics and as a linear multi-step method for f...
Exploring Adam as an approximation to natural gradient descent using the diagonal empirical Fisher Information Matrix, and improvements proposed by FAdam.
Exploring novel theoretical understandings of the Adam optimizer through the lens of Follow-The-Regularized-Leader for its updates, and the provable benefits...
Exploring how choosing the right norm for parameter spaces (like dimension-agnostic operator norms) and principled preconditioning (like PolarGrad) can revol...
Exploring optimization algorithms that dynamically adapt to the problem's structure, reducing or eliminating the need for manual hyperparameter tuning.