Transformers 2Transformers as Constrained Optimization Mar 18, 2026 When Equivalent Weights Train Differently May 6, 2026