Frequency-Domain Muon for Conv Filters - Orthogonalizing the Operator
A technical note on extending Muon to orthogonalize convolution operators in the frequency domain, moving beyond simple reshaped weight projections.
A technical note on extending Muon to orthogonalize convolution operators in the frequency domain, moving beyond simple reshaped weight projections.
A deep dive into computing the orthonormal polar factor (matrix sign function) for tall matrices using minimax polynomials, Jacobi preconditioning, and online certificates, moving beyond standard Newton-Schulz iterations.
A roadmap for GPU-friendly inverse p-th root applies and fast solves, inspired by Muon, Polar Express, and Turbo-Muon.
An introduction to Discrete Calculus, a theory for sums and differences of sequences as opposed to derivatives and integrals of functions in infinitesimal calculus.

A beginner-friendly introduction to stochastic calculus, focusing on intuition and calculus-based derivations instead of heavy probability theory formalism.
Introduction Calculus can be quite tedious when computed symbolically by hand. In many modern applications (for example, in machine learning), automatic differentiation is used to efficiently comp...
A detailed derivation of the reverse-time stochastic differential equation used in Score-Based Generative Modeling.