Cheat Sheet: Elementary Functional Analysis for Optimization

A concise summary of core functional analysis concepts, emphasizing bra-ket notation, dual spaces, and transformation properties, crucial for machine learning optimization theory.

Posted May 22, 2025

By Ji-Ha Kim

7 min read

Elementary Functional Analysis: Key Concepts for Optimization

This cheat sheet summarizes core concepts from functional analysis, particularly relevant to optimization theory in machine learning. The focus is on understanding the distinct nature of mathematical objects (kets and bras) and their behavior under transformations, using Dirac’s bra-ket notation.

1. Distinguishing Kets and Bras: Duality and Transformations

The fundamental insight is that “vectors” (kets) and “linear functionals” (bras) are different types of mathematical objects, distinguished by how their components transform under a change of basis. This ensures that physical scalar quantities derived from their pairing remain invariant.

Notation & Core Idea: Bra-Ket (Dirac Notation) & Transformation Duality
Ket Vector \(\vert v\rangle \in V\): Represents an abstract vector or state (e.g., a “ruler”).
Components: \(\vert v\rangle = \sum_i v^i \vert e_i\rangle\). The components \(v^i\) are contravariant.
Bra Vector \(\langle f \vert \in V^\ast\): Represents a covector/linear functional (e.g., a “pencil” drawing contours), acting on kets to produce scalars.
Components: \(\langle f \vert = \sum_j f_j \langle \epsilon^j \vert\). The components \(f_j\) are covariant.
Dual Basis \(\{ \langle \epsilon^j \vert \}\) for \(V^\ast\): Defined by \(\langle \epsilon^j \vert e_i\rangle = \delta^j_i\) (Kronecker delta) relative to a primal basis \(\{ \vert e_i\rangle \}\) for \(V\). Dual basis elements also transform contravariantly.
Invariant Pairing: The action \(\langle f \vert v\rangle = \sum_k f_k v^k\) is a scalar invariant under basis transformations. This invariance dictates the reciprocal transformation rules for \(v^i\) and \(f_j\).

Key Insight: Transformation Rules

If primal basis kets scale as \(\vert e'_i \rangle = \alpha_i \vert e_i \rangle\):

Ket components scale as \((v')^i = v^i / \alpha_i\) (contravariant).
Dual basis bras scale as \(\langle (\epsilon')^j \vert = (1/\alpha_j) \langle \epsilon^j \vert\) (contravariant).
Bra components scale as \((f')_j = f_j \alpha_j\) (covariant).

This differing behavior underpins the distinction between vectors (kets) and covectors (bras) and is central to tensor calculus.

2. Normed Vector Spaces and Banach Spaces

These structures allow us to measure “size” and define “completeness.”

Definition. Norm & Normed Space
A norm on a vector space \(V\) is \(\Vert \cdot \Vert_V : V \to \mathbb{R}\) satisfying:
\(\Vert \vert x \rangle \Vert_V \ge 0\) (Non-negativity)
\(\Vert \vert x \rangle \Vert_V = 0 \iff \vert x\rangle = \vert \mathbf{0}\rangle\) (Definiteness)
\(\Vert c \vert x \rangle \Vert_V = \vert c \vert \Vert \vert x \rangle \Vert_V\) (Absolute homogeneity)
\(\Vert \vert x \rangle + \vert y \rangle \Vert_V \le \Vert \vert x \rangle \Vert_V + \Vert \vert y \rangle \Vert_V\) (Triangle Inequality) A vector space with a norm is a normed vector space.

Definition. Dual Norm (on \(V^\ast\))
For \(\langle f \vert \in V^\ast\) (the space of continuous linear functionals on \(V\)):
\[\Vert \langle f \vert \Vert_{V^\ast} = \sup_{\Vert \vert x \rangle \Vert_V=1, \vert x \rangle \in V} \vert \langle f \vert x \rangle \vert = \sup_{\vert x \rangle \ne \vert \mathbf{0} \rangle_V} \frac{\vert \langle f \vert x \rangle \vert}{\Vert \vert x \rangle \Vert_V}\]
This measures the maximum “amplification” of a functional.

Definition. Banach Space
A normed vector space where every Cauchy sequence converges to an element within the space.
Significance: Ensures limits exist, crucial for iterative algorithms.
\(V^\ast\) (with the dual norm) is always a Banach space, even if \(V\) is not.

3. Inner Product Spaces and Hilbert Spaces

Inner products introduce richer geometric structure (angles, orthogonality).

Definition. Inner Product (Complex Case)
A function \(\langle \cdot \vert \cdot \rangle : V \times V \to \mathbb{C}\) satisfying for kets \(\vert x\rangle, \vert y\rangle, \vert z\rangle \in V\) and scalar \(c\):
Conjugate Symmetry: \(\langle y \vert x \rangle = (\langle x \vert y \rangle)^\ast\).
Linearity in the second argument (ket): \(\langle z \vert c x + y \rangle = c \langle z \vert x \rangle + \langle z \vert y \rangle\).
(Implied) Conjugate-linearity in the first argument (ket): \(\langle c z + y \vert x \rangle = c^\ast \langle z \vert x \rangle + \langle y \vert x \rangle\).
Positive-definiteness: \(\langle x \vert x \rangle \ge 0\) (real), and \(\langle x \vert x \rangle = 0 \iff \vert x\rangle = \vert \mathbf{0}\rangle\). An inner product space has an inner product. It induces a norm: \(\Vert \vert x \rangle \Vert = \sqrt{\langle x \vert x \rangle}\).

Definition. Hilbert Space (\(\mathcal{H}\))
An inner product space that is complete with respect to its induced norm.
Hilbert spaces are Banach spaces with geometrically rich norms.
Key properties: Cauchy-Schwarz inequality (\(\vert \langle x \vert y \rangle \vert \le \Vert \vert x \rangle \Vert \Vert \vert y \rangle \Vert\)), orthogonality (\(\langle x \vert y \rangle = 0\)).

Theorem. Riesz Representation Theorem (for Hilbert Spaces)
For every continuous linear functional \(\langle \phi \vert \in \mathcal{H}^\ast\), there exists a unique ket \(\vert y_\phi\rangle \in \mathcal{H}\) such that:
\[\langle \phi \vert x \rangle = \langle y_\phi \vert x \rangle \quad \text{for all } \vert x\rangle \in \mathcal{H}\]
And \(\Vert \langle \phi \vert \Vert_{\mathcal{H}^\ast} = \Vert \vert y_\phi \Vert_{\mathcal{H}}\).
Significance: In Hilbert spaces, bras (functionals) can be uniquely identified with kets via the inner product. This is the “magic bridge” that often makes the distinction seem less critical in \(\mathbb{R}^n\) with the dot product, but the underlying “types” remain different. The mapping is anti-linear for complex spaces.

4. Linear Operators and Adjoints in Hilbert Spaces

Definition. Linear Operator & Adjoint
Linear Operator \(T: \mathcal{H}_1 \to \mathcal{H}_2\): Maps kets linearly.
Adjoint Operator \(T^\dagger: \mathcal{H}_2 \to \mathcal{H}_1\): Defined by the relation:
\[\langle y \vert (T \vert x \rangle) \rangle_{\mathcal{H}_2} = \langle (T^\dagger \vert y \rangle) \vert x \rangle_{\mathcal{H}_1}\]
The matrix of \(T^\dagger\) is \(A^H\) (conjugate transpose of matrix \(A\) for \(T\)) if and only if bases are orthonormal. Otherwise, it involves Gram matrices (\(G_1^{-1} A^H G_2\)), highlighting the role of the metric and transformation rules.
Key Types (for \(T: \mathcal{H} \to \mathcal{H}\)):
Self-Adjoint (Hermitian): \(T = T^\dagger\). Eigenvalues are real.
Unitary (Orthogonal for real): \(T^\dagger T = T T^\dagger = I\). Preserves inner products.
Normal: \(T T^\dagger = T^\dagger T\).

Theorem. Spectral Theorem (for Compact Self-Adjoint Operators)
If \(T\) is a compact self-adjoint operator on \(\mathcal{H}\), there’s an orthonormal basis of eigenkets \((\vert \phi_k \rangle)_k\) with real eigenvalues \((\lambda_k)_k\) such that:
\[T = \sum_k \lambda_k \vert \phi_k\rangle \langle \phi_k \vert\]
(The term \(\vert \phi_k\rangle \langle \phi_k \vert\) is a projection operator). SVD is a related decomposition for general compact operators.

5. Derivatives in Normed Spaces (for Optimization)

For a function \(J: V \to \mathbb{R}\) (e.g., loss function, \(V\) is parameter space).

Definition. Fréchet Derivative
\(J: U \subseteq V \to \mathbb{R}\) is Fréchet differentiable at \(\vert x\rangle \in U\) if there’s a bounded linear functional \(DJ(\vert x\rangle) : V \to \mathbb{R}\) such that:
\[J(\vert x \rangle + \vert h \rangle) = J(\vert x \rangle) + (DJ(\vert x \rangle) \vert h \rangle) + o(\Vert \vert h \rangle \Vert_V)\]
The Fréchet derivative \(DJ(\vert x\rangle)\) is a bra: \(\langle DJ(\vert x\rangle) \vert \in V^\ast\). It’s a “measurement device” for the linear rate of change.

Definition. Gradient Ket (in Hilbert Space \(\mathcal{H}\))
The unique ket \(\vert \nabla J(\vert x\rangle) \rangle \in \mathcal{H}\) obtained from the Fréchet derivative bra \(\langle DJ(\vert x\rangle) \vert\) via the Riesz Representation Theorem:
\[\langle DJ(\vert x\rangle) \vert h \rangle = \langle \nabla J(\vert x\rangle) \vert h \rangle_{\mathcal{H}} \quad \text{for all } \vert h\rangle \in \mathcal{H}\]
The gradient \(\vert \nabla J(\vert x\rangle) \rangle\) is a ket in \(\mathcal{H}\) (direction of steepest ascent), distinct in type from the derivative functional (bra).
Hessian Operator \(\nabla^2 J(\vert x\rangle)\): A self-adjoint operator \(\mathcal{H} \to \mathcal{H}\) representing the second derivative, obtained by applying Riesz to the second Fréchet derivative functional.

This careful distinction between kets and bras, and their transformation properties, provides a robust foundation for understanding advanced optimization techniques and the geometry of machine learning models.

This cheat sheet captures the main definitions and theorems, framed by the new narrative about distinguishing kets and bras based on their transformation properties and intrinsic nature.

Mathematical Foundations, Machine Learning

This post is licensed under CC BY 4.0 by the author.