<feed xmlns="http://www.w3.org/2005/Atom"> <id>https://jiha-kim.github.io/</id><title>Ji-Ha's Blog</title><subtitle>Ji-Ha Kim's blog for sharing things.</subtitle> <updated>2026-04-11T01:11:46+00:00</updated> <author> <name>Ji-Ha Kim</name> <uri>https://jiha-kim.github.io/</uri> </author><link rel="self" type="application/atom+xml" href="https://jiha-kim.github.io/feed.xml"/><link rel="alternate" type="text/html" hreflang="en" href="https://jiha-kim.github.io/"/> <generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator> <rights> © 2026 Ji-Ha Kim </rights> <icon>/assets/img/favicons/favicon.ico</icon> <logo>/assets/img/favicons/favicon-96x96.png</logo> <entry><title>Fast Polar Decomposition for Muon Optimizer with Rational and Polynomial Iterations</title><link href="https://jiha-kim.github.io/posts/rational-polar-decomposition/" rel="alternate" type="text/html" title="Fast Polar Decomposition for Muon Optimizer with Rational and Polynomial Iterations" /><published>2026-04-05T00:00:00+00:00</published> <updated>2026-04-10T12:41:38+00:00</updated> <id>https://jiha-kim.github.io/posts/rational-polar-decomposition/</id> <content type="text/html" src="https://jiha-kim.github.io/posts/rational-polar-decomposition/" /> <author> <name>jiha</name> </author> <category term="Numerical Linear Algebra" /> <category term="Mathematical Optimization" /> <summary>A hardware-aware hybrid polar decomposition for ML: one Dynamic Weighted Halley (rational) step to handle the hard early regime, then two Polar Express (polynomial) cleanup steps once the spectrum is easy. The result is exactly two rectangular GEMMs, no eigendecomposition or power iteration, and robust convergence from condition numbers up to 1000.</summary> </entry> <entry><title>Dynamics of Higher-Order Residual Networks</title><link href="https://jiha-kim.github.io/posts/second-order-residual-networks/" rel="alternate" type="text/html" title="Dynamics of Higher-Order Residual Networks" /><published>2026-03-31T19:00:00+00:00</published> <updated>2026-03-31T20:49:40+00:00</updated> <id>https://jiha-kim.github.io/posts/second-order-residual-networks/</id> <content type="text/html" src="https://jiha-kim.github.io/posts/second-order-residual-networks/" /> <author> <name>jiha</name> </author> <category term="Machine Learning" /> <category term="Mathematical Foundations" /> <summary>Residual networks are first-order depth dynamics that can only produce exponential modes. Upgrading to second-order by adding a velocity state unlocks bounded oscillatory modes, separates content from transport, and connects depth to the same mathematical framework as LSTMs and state-space models.</summary> </entry> <entry><title>Lion-K CCWD: Corrected Cautious Weight Decay and Hyperparameter Transfer</title><link href="https://jiha-kim.github.io/posts/lion-k-ccwd/" rel="alternate" type="text/html" title="Lion-K CCWD: Corrected Cautious Weight Decay and Hyperparameter Transfer" /><published>2026-03-23T11:00:00+00:00</published> <updated>2026-03-27T23:58:31+00:00</updated> <id>https://jiha-kim.github.io/posts/lion-k-ccwd/</id> <content type="text/html" src="https://jiha-kim.github.io/posts/lion-k-ccwd/" /> <author> <name>jiha</name> </author> <category term="Machine Learning" /> <category term="Mathematical Optimization" /> <summary>Derivation of the Lion-K optimizer with Corrected Cautious Weight Decay (CCWD) and transformation rules for hyperparameter transfer.</summary> </entry> <entry><title>Transformers as Constrained Optimization</title><link href="https://jiha-kim.github.io/posts/transformers-as-constrained-optimization/" rel="alternate" type="text/html" title="Transformers as Constrained Optimization" /><published>2026-03-18T23:38:00+00:00</published> <updated>2026-03-19T17:00:46+00:00</updated> <id>https://jiha-kim.github.io/posts/transformers-as-constrained-optimization/</id> <content type="text/html" src="https://jiha-kim.github.io/posts/transformers-as-constrained-optimization/" /> <author> <name>jiha</name> </author> <category term="Machine Learning" /> <category term="Mathematical Optimization" /> <summary>Rewriting a pre-norm decoder-only transformer as a mixed-geometry constrained splitting scheme: RMSNorm as radial gauge fixing, attention as an entropy- or KL-constrained simplex solve, and residual branches as Euclidean trust-region steps.</summary> </entry> <entry><title>Optimizers and ODEs</title><link href="https://jiha-kim.github.io/posts/optimizers-and-odes/" rel="alternate" type="text/html" title="Optimizers and ODEs" /><published>2026-03-15T22:59:00+00:00</published> <updated>2026-03-16T01:49:49+00:00</updated> <id>https://jiha-kim.github.io/posts/optimizers-and-odes/</id> <content type="text/html" src="https://jiha-kim.github.io/posts/optimizers-and-odes/" /> <author> <name>jiha</name> </author> <category term="Machine Learning" /> <category term="Mathematical Optimization" /> <summary>A continuous-time view of gradient-based optimization: starting from the observation that integrator choice matters in physics simulation, and transferring that insight to understand modern optimizers.</summary> </entry> </feed>
