Archives
- 06 May When Equivalent Weights Train Differently
- 28 Apr Fast Tight Spectral-Norm Bounds
- 19 Apr Autoregression vs Diffusion - Understanding Sampling via Optimal Transport
- 05 Apr Fast Polar Decomposition for Muon Optimizer with Rational and Polynomial Iterations
- 23 Mar Lion-K CCWD: Corrected Cautious Weight Decay
- 18 Mar Transformers as Constrained Optimization
- 04 Mar Frequency-Domain Muon for Conv Filters - Orthogonalizing the Operator