Autoregression vs Diffusion - Understanding Sampling via Optimal Transport
Autoregression and diffusion look like opposites, but both are solving the same transport problem - how to turn simple noise into structured data.
Autoregression and diffusion look like opposites, but both are solving the same transport problem - how to turn simple noise into structured data.
A hardware-aware hybrid polar decomposition for ML: one Dynamic Weighted Halley (rational) step to handle the hard early regime, then two Polar Express (polynomial) cleanup steps once the spectrum is easy. The result is exactly two rectangular GEMMs, no eigendecomposition or power iteration, and robust convergence from condition numbers up to 1000.
Residual networks are first-order depth dynamics that can only produce exponential modes. Upgrading to second-order by adding a velocity state unlocks bounded oscillatory modes, separates content from transport, and connects depth to the same mathematical framework as LSTMs and state-space models.
Derivation of the Lion-K optimizer with Corrected Cautious Weight Decay (CCWD) and transformation rules for hyperparameter transfer.
Rewriting a pre-norm decoder-only transformer as a mixed-geometry constrained splitting scheme: RMSNorm as radial gauge fixing, attention as an entropy- or KL-constrained simplex solve, and residual branches as Euclidean trust-region steps.

A continuous-time view of gradient-based optimization: starting from the observation that integrator choice matters in physics simulation, and transferring that insight to understand modern optimizers.
A technical note on extending Muon to orthogonalize convolution operators in the frequency domain, moving beyond simple reshaped weight projections.
An introduction to Discrete Calculus, a theory for sums and differences of sequences as opposed to derivatives and integrals of functions in infinitesimal calculus.

A beginner-friendly introduction to stochastic calculus, focusing on intuition and calculus-based derivations instead of heavy probability theory formalism.
Introduction Calculus can be quite tedious when computed symbolically by hand. In many modern applications (for example, in machine learning), automatic differentiation is used to efficiently compu...