Post

Motivating Banach Spaces: Norms Measure Size

Exploring why complete normed spaces without inner products (Banach spaces) are essential, with examples like Lp and C(K) spaces, and their impact on analysis and optimization.

Motivating Banach Spaces: Norms Measure Size

Note: for the sake of readability, often kets (vectors) will have their type annotation dropped, writing just \(v:= \vert v \rangle\) as it’s otherwise quite messy when inside a norm. However, bras (covectors/linear functionals) will be written with their type annotation, \(\langle f \vert\), so that we avoid ambiguity.

Welcome to Part 3 of our Functional Analysis crash course. In the previous post, we explored Hilbert spaces—complete inner product spaces that provide a rich geometric structure with notions of length, angle, and orthogonality. They are the natural home for quantum mechanics and Fourier analysis.

But what if the most natural way to measure a function’s “size” doesn’t come from an inner product? This question leads us directly to Banach spaces.

1. Introduction: Why Bother Without Inner Products?

Hilbert spaces are convenient. Their inner product gives us orthogonality, projections, and the powerful Riesz Representation Theorem, which cleanly identifies a Hilbert space with its dual.

However, in many applications, the concept of an “angle” is not relevant. We still need to measure the size of a vector (e.g., a function) and the distance between two vectors. Crucially, we also need our space to be complete—meaning that sequences that “should” converge actually do converge to a point within the space. Even when working in a space that could be a Hilbert space (like \(\mathbb{R}^n\)), we might intentionally choose a different norm, like the \(L_1\) or \(L_\infty\) norm, because its geometry is better suited to our problem.

This brings us to a broader class of spaces: Banach spaces. By definition, these are complete normed vector spaces, but their norm does not necessarily arise from an inner product. We willingly trade the geometric luxury of an inner product for the flexibility to use norms that are better suited for the problem at hand.

2. Norms Beyond Inner Products

A norm is a function that assigns a strictly positive length or size to each vector in a vector space, except for the zero vector.

Definition 2.1: Norm

A norm on a vector space \(X\) over a field \(\mathbb{F}\) (\(\mathbb{R}\) or \(\mathbb{C}\)) is a function \(\Vert \cdot \Vert: X \to \mathbb{R}\) satisfying for all \(\vert x \rangle, \vert y \rangle \in X\) and all scalars \(\alpha \in \mathbb{F}\):

  1. Non-negativity: \(\Vert x \Vert \ge 0\)
  2. Definiteness: \(\Vert x \Vert = 0 \iff \vert x \rangle = \vert \mathbf{0} \rangle\)
  3. Absolute homogeneity: \(\Vert \alpha x \Vert = \vert \alpha \vert \Vert x \Vert\)
  4. Triangle inequality: \(\Vert x + y \Vert \le \Vert x \Vert + \Vert y \Vert\)

A vector space equipped with a norm is a normed vector space.

The Parallelogram Law: A Litmus Test for Inner Products

How can we determine if a norm is induced by an inner product? The parallelogram law provides a definitive test. A norm \(\Vert \cdot \Vert\) is induced by an inner product \(\langle \cdot \vert \cdot \rangle\) (where \(\Vert x \Vert = \sqrt{\langle x \vert x \rangle}\)) if and only if it satisfies:

\[\Vert x+y \Vert^2 + \Vert x-y \Vert^2 = 2(\Vert x \Vert^2 + \Vert y \Vert^2) \quad \forall \vert x \rangle, \vert y \rangle \in X\]

Geometric Intuition and the Polarization Identity

The parallelogram law states that the sum of the squares of the diagonals’ lengths (\(\vert x+y \rangle\) and \(\vert x-y \rangle\)) equals the sum of the squares of the four sides’ lengths. This is a hallmark of Euclidean geometry, which is built on the dot product. If a norm fails this test, its geometry is not Euclidean.

If the law holds, the inner product can be recovered via the polarization identity. For real spaces:

\[\langle x \vert y \rangle = \frac{1}{4} \left( \Vert x+y \Vert^2 - \Vert x-y \Vert^2 \right)\]

If a norm fails the parallelogram law, it definitively does not come from an inner product. Let’s see this in action.

Example 1: The \(L_p\) Norms for \(p \neq 2\)

The \(L_p\)-norm is a common way to measure the size of functions or vectors. For a function \(f\) on a measure space \((\Omega, \mathcal{M}, \mu)\) and \(1 \le p < \infty\), the norm is:

\[\Vert f \Vert_p = \left( \int_{\Omega} \vert f(x) \vert^p d\mu(x) \right)^{1/p}\]

Let’s test the parallelogram law in the simple space \(\mathbb{R}^2\) with the \(L_1\)-norm (or “Manhattan norm”): \(\Vert \mathbf{x} \Vert_1 = \vert x_1 \vert + \vert x_2 \vert\). Let \(\vert x \rangle = (1,0)\) and \(\vert y \rangle = (0,1)\).

  • Sides: \(\Vert x \Vert_1 = 1\), \(\Vert y \Vert_1 = 1\)
  • Diagonals: \(\Vert x+y \Vert_1 = \Vert(1,1)\Vert_1 = 2\), \(\Vert x-y \Vert_1 = \Vert(1,-1)\Vert_1 = 2\)

Plugging into the parallelogram law:

  • Left-hand side (diagonals): \(\Vert x+y \Vert_1^2 + \Vert x-y \Vert_1^2 = 2^2 + 2^2 = 8\)
  • Right-hand side (sides): \(2(\Vert x \Vert_1^2 + \Vert y \Vert_1^2) = 2(1^2 + 1^2) = 4\)

Since \(8 \neq 4\), the \(L_1\)-norm violates the parallelogram law and is not derived from an inner product. The same is true for all \(L_p\)-norms where \(p \neq 2\).

Why use \(L_p\) norms (\(p \neq 2\))?

  • \(L_1\) Norm: Promotes sparsity. In machine learning (e.g., LASSO regression), penalizing a model’s weights with an \(L_1\) term (\(\lambda \Vert \mathbf{w} \Vert_1\)) forces many weights to become exactly zero. This is excellent for feature selection. The \(L_1\) loss function (Mean Absolute Error) is also more robust to outliers than the standard \(L_2\) loss (Mean Squared Error).
  • General \(L_p\) Norms: Provide a spectrum of error measures with varying sensitivities to large versus small values.

Example 2: The \(L_\infty\) (Supremum) Norm

For a continuous function \(f\) on a compact set \(K\) (e.g., an interval \([a,b]\)), the supremum norm measures its peak value:

\[\Vert f \Vert_\infty = \max_{x \in K} \vert f(x) \vert\]

Let’s test this in \(\mathbb{R}^2\) with \(\Vert \mathbf{x} \Vert_\infty = \max(\vert x_1 \vert, \vert x_2 \vert)\). Again, let \(\vert x \rangle = (1,0)\) and \(\vert y \rangle = (0,1)\).

  • Sides: \(\Vert x \Vert_\infty = \max(1,0)=1\), \(\Vert y \Vert_\infty = \max(0,1)=1\)
  • Diagonals: \(\Vert x+y \Vert_\infty = \Vert(1,1)\Vert_\infty = 1\), \(\Vert x-y \Vert_\infty = \Vert(1,-1)\Vert_\infty = 1\)

Checking the parallelogram law:

  • LHS: \(\Vert x+y \Vert_\infty^2 + \Vert x-y \Vert_\infty^2 = 1^2 + 1^2 = 2\)
  • RHS: \(2(\Vert x \Vert_\infty^2 + \Vert y \Vert_\infty^2) = 2(1^2 + 1^2) = 4\)

Since \(2 \neq 4\), the \(L_\infty\)-norm also fails the test.

Why use the \(L_\infty\) norm? It measures the maximum deviation or “worst-case error.” A sequence of functions \(f_n\) converges to \(f\) in this norm (\(\Vert f_n - f \Vert_\infty \to 0\)) if and only if the convergence is uniform. Uniform convergence is a powerful property, ensuring that approximations are good “everywhere at once” and that properties like continuity are preserved in the limit.

3. Completeness: The Defining Feature of Banach Spaces

A norm induces a metric \(d(x,y) = \Vert x-y \Vert\), allowing us to define convergence and Cauchy sequences. As with Hilbert spaces, completeness is the crucial property that ensures our space has no “holes.”

Definition 3.1: Banach Space

A Banach space is a normed vector space that is complete with respect to the metric induced by its norm. This means every Cauchy sequence of vectors converges to a limit that is also within the space.

All Hilbert spaces are Banach spaces, but the converse is not true. The spaces defined by the norms we just explored are prime examples of Banach spaces that are not Hilbert spaces.

Key Examples of Banach Spaces

  • The \(L_p(\Omega)\) spaces: For any measure space \(\Omega\) and \(1 \le p \le \infty\), the space \(L_p(\Omega)\) of functions with a finite \(L_p\)-norm is a Banach space. The proof of this (part of the Riesz-Fischer theorem) is a cornerstone of modern analysis.
    • For \(p=2\), \(L_2(\Omega)\) is a Hilbert space.
    • For \(p \neq 2\), \(L_p(\Omega)\) is a Banach space but not a Hilbert space.
  • The space \(C(K)\)$: The space of all continuous functions on a compact set \(K\) (e.g., \([a,b]\)), equipped with the supremum norm \(\Vert f \Vert_\infty\), is a Banach space.

    Proof Sketch: Completeness of \(C(K)\)

    Let \((f_n)\) be a Cauchy sequence in \(C(K)\). The proof follows three steps:

    1. Define a candidate limit: For each point \(x \in K\), the sequence of numbers \((f_n(x))\) is Cauchy in \(\mathbb{R}\) (or \(\mathbb{C}\)), so it converges. We define our limit function as \(f(x) := \lim_{n \to \infty} f_n(x)\).
    2. Show uniform convergence: We show that the convergence \(f_n \to f\) is uniform. This means \(\Vert f_n - f \Vert_\infty \to 0\), so the convergence happens in the norm of the space.
    3. Show the limit is in the space: A key theorem of analysis states that the uniform limit of a sequence of continuous functions is itself continuous. Therefore, \(f \in C(K)\).

    Since every Cauchy sequence converges to a limit within the space, \(C(K)\) is complete and thus a Banach space.

  • The sequence spaces \(\ell_p\): For \(1 \le p \le \infty\), the space \(\ell_p\) of sequences \(\mathbf{x}=(x_1, x_2, \dots)\) with finite \(\Vert \mathbf{x} \Vert_p\) is a Banach space. Again, only \(\ell_2\) is a Hilbert space.

4. The Hahn-Banach Theorem: A Pillar of Analysis

Despite lacking a universal notion of orthogonality, Banach spaces have a rich analytical structure. The most fundamental tool for understanding this structure is the Hahn-Banach Theorem. In essence, it guarantees that there are “enough” bounded linear functionals to make the theory of dual spaces powerful and interesting.

Theorem 4.1: The Hahn-Banach Theorem (Analytic Form)

Let \(X\) be a vector space over \(\mathbb{F}\) (\(\mathbb{R}\) or \(\mathbb{C}\)), and let \(p: X \to \mathbb{R}\) be a sublinear functional (i.e., it satisfies \(p(\alpha x) = \alpha p(x)\) for \(\alpha \ge 0\) and \(p(x+y) \le p(x)+p(y)\). The norm \(\Vert \cdot \Vert\) is an example of a sublinear functional).

Let \(Z\) be a subspace of \(X\), and let \(\langle g \vert\) be a linear functional on \(Z\) that is dominated by \(p\), meaning \(\vert \langle g \vert z \rangle \vert \le p(z)\) for all \(\vert z \rangle \in Z\).

Then there exists a linear functional \(\langle f \vert\) defined on all of \(X\) such that:

  1. Extension: \(\langle f \vert z \rangle = \langle g \vert z \rangle\) for all \(\vert z \rangle \in Z\).
  2. Domination: \(\vert \langle f \vert x \rangle \vert \le p(x)\) for all \(\vert x \rangle \in X\).

In the context of normed spaces, we set \(p(x) = \Vert g \Vert_\ast \Vert x \Vert\). The theorem then states that any bounded linear functional on a subspace can be extended to the entire space without increasing its norm.

Key Consequences of Hahn-Banach

The Hahn-Banach theorem is a pure existence theorem (its proof relies on Zorn’s Lemma), but its consequences are profound and practical.

  1. The Dual Space is Rich: It guarantees that the dual space \(X^\ast\) of any non-trivial normed space is itself non-trivial. It’s not just an empty collection of the zero functional.
  2. Separation of Convex Sets: In its geometric form, the theorem allows us to find a hyperplane that separates two disjoint convex sets. This is fundamental to optimization theory.
  3. Existence of “Witness” Functionals: For any vector in the space, we can find a functional that “sees” it perfectly. This is arguably its most important consequence for analysis.

Proposition 4.2: Consequence of Hahn-Banach

For any non-zero vector \(\vert x_0 \rangle\) in a normed space \(X\), there exists a bounded linear functional \(\langle f \vert\) in the dual space \(X^\ast\) such that:

\[\Vert \langle f \vert \Vert_\ast = 1 \quad \text{and} \quad \langle f \vert x_0 \rangle = \Vert x_0 \Vert\]

This proposition confirms that the dual space is rich enough to distinguish all vectors. If \(\vert x \rangle \neq \vert y \rangle\), there is a functional \(\langle f \vert\) such that \(\langle f \vert x-y \rangle = \Vert x-y \Vert \neq 0\), so \(\langle f \vert x \rangle \neq \langle f \vert y \rangle\).

5. The Dual Space and the Duality Mapping

The Hahn-Banach theorem breathes life into the concept of the dual space. This section introduces the duality mapping of functional analysis to type cast between the primal and dual spaces.

Definition 5.1: The Dual Space and Dual Norm

Let \(X\) be a normed vector space.

  1. The (continuous) dual space, denoted \(X^\ast\), is the vector space of all bounded linear functionals \(\langle f \vert : X \to \mathbb{F}\).
  2. The dual norm on \(X^\ast\) is defined as:

    \[\Vert \langle f \vert \Vert_\ast = \sup_{\Vert x \Vert = 1} \vert \langle f \vert x \rangle \vert\]

It’s a theorem that if \(X\) is a Banach space, then \(X^\ast\) is also a Banach space. The Hahn-Banach theorem guarantees that for any non-zero functional, there exists a vector that achieves this supremum.

Now we introduce the standard tool from functional analysis for relating the primal and dual spaces.

Definition 5.2: The Duality Mapping (Standard Definition)

The duality mapping \(J: X \to 2^{X^\ast}\) is a mapping that associates each vector \(\vert x \rangle \in X\) with the set of its “support functionals” \(\langle f \vert \in X^\ast\). These are the functionals that are perfectly aligned with \(\vert x \rangle\) in the sense that they satisfy:

  1. Alignment:
\[\langle f \vert x \rangle = \Vert f \Vert_\ast \Vert x \Vert\]
  1. Isometry:
\[\Vert \langle f \vert \Vert_\ast = \Vert x \Vert\]

For practical algebraic computation, we can also use the following equivalent convenient formulation.

\[\langle f \vert x \rangle = \Vert x \Vert^2 = \Vert f \Vert_\ast^2\]

The Hahn-Banach theorem guarantees that this set \(J(x)\) is always non-empty. This mapping is fundamental to the theory of monotone operators and nonlinear analysis.

Analogy: Mechanical Leverage

Think of the covector \(\langle f \vert\) as some force, \(\vert x \rangle\) as a desired displacement. The duality mapping embodies the optimal mechanical advantage:

\[J(x) = \underset{\langle f \vert}{\mathrm{argmax}} \; \frac{\langle f \vert x \rangle}{\Vert f \Vert_\ast}\]

where \(\Vert f \Vert_\ast\) represents the “effort budget”. The maximum is achieved when the force and displacement are collinear.

Proposition 5.3: Properties of the Duality Mapping

Because the duality mapping corresponds to finding the covector that is linearly dependent on the vector in order to saturate the generalized Cauchy-Schwarz inequality, we can write it as a variational problem.

\[J(x) = \underset{\langle g \vert \in X^\ast,\Vert \langle g \vert \Vert_\ast = \Vert x \Vert}{\mathrm{argmax}} \; \vert \langle g \vert x \rangle \vert\]

The function \(\phi(x) = \frac{1}{2} \Vert x \Vert^2\) is the simplest strictly convex gauge of vector magnitude, and is Gâteaux differentiable at any \(x \ne 0\) in any strictly convex Banach space. What’s interesting is that the duality mapping is precisely the subdifferential of this function.

\[J(x)=\partial \phi(x)\]

A common interpretation is using an alternative variational formulation as seen as the proximal descent algorithm in convex optimization. If we consider a local/proximal model for a loss with \(\lambda>0\):

\[\mathcal{L}(w+\Delta w) \approx Q_\lambda(w,\Delta w) := \mathcal{L}(w) + \langle \nabla \mathcal{L}(w) \vert \Delta w \rangle + \frac{1}{2\lambda} \Vert \Delta w \Vert^2\]

then we see that \(J^{-1}\) solves for the new iterate in proximal gradient descent. So it serves as a local minimization oracle.

\[\underset{\Delta w \in X}{\mathrm{argmin}} \; Q_\lambda (w,\Delta w) = \underset{\Delta w \in X}{\mathrm{argmin}} \; \langle \nabla \mathcal{L}(w) \vert \Delta w \rangle + \frac{1}{2\lambda} \Vert \Delta w \Vert^2 = -\lambda J^{-1}(\nabla \mathcal{L}(w))\]

The Optimizer’s View: The “Dualize” Operation

In optimization, the practical problem is different. Given a gradient covector \(\langle g \vert \in X^\ast\), we need to find the direction of steepest ascent, which is a vector \(\vert v \rangle \in X\). In his work, Jeremy Bernstein refers to this operation as dualize.

The core idea is to find the unit vector that maximizes the linear functional’s value. The dualize mapping is defined as:

\[\text{dualize}(\langle g \vert) := \underset{\vert v \rangle \in X, \Vert v \Vert=1}{\text{argmax}} \langle g \vert v \rangle\]

The result of this operation is the unit vector \(\vert v \rangle\) pointing in the direction of steepest increase for the functional \(\langle g \vert\). By definition of the dual norm, the value of this maximum is \(\langle g \vert v \rangle = \Vert \langle g \vert \Vert_\ast\).

This dualize operation is the crucial step for converting a gradient (covector) into an update direction (vector) for optimization algorithms in general Banach spaces.

Compared to the standard definition of the duality mapping \(J\) in functional analysis, Jeremy Bernstein’s definition only extracts direction, and decouples it from the gradient dual norm.

\[\Vert \langle g \vert \Vert_\ast \text{dualize}(\langle g \vert) = J^{-1}(\langle g \vert)\]

Contrast with Hilbert Spaces

In a Hilbert space \(H\) with norm induced by the inner product, life is simpler. The Riesz Representation Theorem provides a unique vector \(\vert g \rangle\) for every covector \(\langle g \vert\) such that \(\langle g \vert v \rangle = \langle g, v \rangle\) (the inner product). The direction of steepest ascent for \(\langle g \vert\) is simply the normalized vector \(\vert g \rangle / \Vert g \Vert\). The dualize operation becomes:

\[\text{dualize}(\langle g \vert) = \frac{\vert g \rangle}{\Vert g \Vert}\]

Because of this unique correspondence, we often blur the distinction between \(H\) and \(H^\ast\). This is why people tend to get confused in machine learning and simply treat the gradient as a vector/ket instead of a covector/bra/linear functional/1-form. Because we parameterize everything with real numbers, people start to think we are working in \(\mathbb{R}^n\) with the Euclidean inner product \(\langle x \vert y \rangle := x^T y\), which is a Hilbert space. In doing so, however, we are ignoring the underlying geometry of the space, which empirically we have seen to be more appropriately captured by non-induced norms like the spectral norm in the Muon optimizer. In a general Banach space, we must use the more general dualize definition involving the argmax to find this direction.

6. Other Foundational Theorems and Applications

The analytical power of Banach spaces is further demonstrated by a “holy trinity” of theorems about bounded operators and a powerful fixed-point theorem.

  • The “Holy Trinity” of Bounded Operators:
    1. Uniform Boundedness Principle: A family of operators that is pointwise bounded is uniformly bounded. (Pointwise good implies uniformly good).
    2. Open Mapping Theorem: A surjective bounded linear operator between Banach spaces maps open sets to open sets. (A key corollary is the Bounded Inverse Theorem).
    3. Closed Graph Theorem: An operator is bounded if and only if its graph is a closed set. This is often an easier way to prove an operator is continuous.
  • The Banach Fixed-Point Theorem (Contraction Mapping Principle): Let \((X,d)\) be a complete metric space (every Banach space is one). If an operator \(T: X \to X\) is a contraction—meaning it shrinks distances by a uniform factor \(k < 1\):

    \[d(T(x), T(y)) \le k \cdot d(x,y)\]

    Then \(T\) has one and only one fixed point (\(x^\ast\) such that \(T(x^\ast) = x^\ast\)). This point can be found by iterating \(x_{n+1} = T(x_n)\) from any starting point \(x_0 \in X\).

    Application: Solving Differential Equations

    The initial value problem \(y'(t) = F(t, y(t))\) with \(y(t_0)=y_0\) can be rewritten as an integral equation:

    \[y(t) = y_0 + \int_{t_0}^t F(s, y(s)) ds\]

    A solution \(y(t)\) is a fixed point of the operator \(\mathcal{T}\) defined by the right-hand side: \((\mathcal{T}y)(t) = \dots\). If \(F\) is Lipschitz continuous in its second argument, then for a small enough time interval, \(\mathcal{T}\) is a contraction on the Banach space \(C(I)\) of continuous functions. The Banach Fixed-Point Theorem then guarantees a unique local solution. This is the essence of the Picard-Lindelöf theorem.

7. Banach Spaces in Machine Learning and Optimization

While much of ML operates in finite-dimensional Euclidean space (a Hilbert space), the theory behind advanced methods relies heavily on Banach space concepts.

  • Sparsity via \(L_1\) Regularization: Penalizing model weights with the \(L_1\)-norm (\(\lambda \Vert \mathbf{w} \Vert_1\)) is the core of LASSO and other techniques for feature selection. The “sharp corners” of the \(L_1\) unit ball geometrically encourage solutions where many weights are exactly zero. Optimization with the non-differentiable \(L_1\) norm requires tools like subgradient calculus, which are naturally studied in this context.
  • Robustness via \(L_1\) Loss: Using Mean Absolute Error (\(L_1\) loss) instead of Mean Squared Error (\(L_2\) loss) makes models less sensitive to outliers in the training data.
  • Probabilistic Models: Probability theory is built on measure theory. Spaces like \(L_1(\Omega, \mathcal{F}, P)\) are Banach spaces essential for defining expected values, \(E[X] = \int X dP\).
  • Theory of Optimization: Analyzing the convergence of algorithms like gradient descent in non-Euclidean geometries requires the machinery described above. Converting a gradient (covector) into an update direction (vector) requires the dualize operation to find the direction of steepest descent.

8. Conclusion: A Broader Analytical Landscape

Banach spaces generalize Hilbert spaces by dropping the requirement that a norm must come from an inner product. This trade-off is immensely fruitful. While we lose the universal geometric intuition of angles and orthogonality, we gain a far broader framework capable of handling diverse measures of “size.” The \(L_1\) norm for sparsity, the \(L_\infty\) norm for uniform control, and the general \(L_p\) norms for modeling different error sensitivities are indispensable tools in modern science and engineering.

The crucial property of completeness is retained, providing a solid foundation for analysis. Foundational results like the Hahn-Banach Theorem, the major theorems on bounded operators, and the Banach Fixed-Point Theorem form the backbone of modern analysis. They provide the tools to solve differential equations, understand operator theory, and build the theoretical underpinnings for advanced optimization and machine learning. In short, Banach spaces provide the language to explore a vast and varied landscape of mathematical structures far beyond the confines of Euclidean geometry.

Next Up: In the final post of this mini-series, we will focus on linear operators, exploring how Matrix Spectral Analysis generalizes to operators on infinite-dimensional Hilbert and Banach spaces.

9. Summary Cheat Sheet

ConceptDescriptionKey Example(s)Why Important
Normed SpaceVector space with a function \(\Vert \cdot \Vert\) defining length/size.\(C([a,b])\) with \(\Vert \cdot \Vert_\infty\), \(L_p\) spacesBasic structure for measuring size and distance.
Parallelogram LawIdentity: \(\Vert x+y \Vert^2 + \Vert x-y \Vert^2 = 2(\Vert x \Vert^2 + \Vert y \Vert^2)\). Test for inner product norm.Fails for \(L_p\) (\(p\neq 2\)) and \(L_\infty\) norms.Distinguishes Hilbert space norms from general norms.
\(L_p\) Norms (\(p\neq 2\))\(\Vert f \Vert_p = (\int \vert f\vert^p)^{1/p}\). Measures size with varying sensitivity.\(L_1\) (Manhattan/taxicab).Model different error types; \(L_1\) promotes sparsity.
\(L_\infty\) Norm\(\Vert f \Vert_\infty = \sup \vert f\vert\). Measures the peak value or worst-case error.Space of continuous functions \(C(K)\).Equivalent to uniform convergence.
Banach SpaceA complete normed vector space.\(L_p\) spaces (\(1\le p \le \infty\)), \(C(K)\).Ensures Cauchy sequences converge; robust analytical framework.
Dual Space \(X^\ast\)Space of all bounded linear functionals \(f: X \to \mathbb{F}\). It is always a Banach space.\((L_p)^\ast = L_q\) for \(1<p<\infty\). \((L_1)^\ast = L_\infty\).The natural home for gradients (covectors).
Hahn-Banach TheoremGuarantees norm-preserving extension of bounded linear functionals from a subspace to the full space.-Ensures dual space is rich enough to define concepts like steepest ascent.
dualize operationAn optimization-centric map. dualize(g) finds the unit vector v that maximizes <g | v>, i.e., the direction of steepest ascent for the covector g.In Hilbert space, this is g / | | g | | .Converts a gradient (covector) into a descent direction (vector).
Banach Fixed-Point Thm.A contraction map \(T\) on a complete metric space has a unique fixed point, \(T(x^\ast)=x^\ast\).Picard’s method for solving ODEs.Guarantees existence/uniqueness of solutions; basis for iterative algorithms.
“Holy Trinity”Uniform Boundedness Principle, Open Mapping Thm., Closed Graph Thm. Foundational results for bounded linear operators.-Govern the fundamental properties of operators between Banach spaces.

Further Reading

Wikipedia contributors. (2025, April 14). Banach space. Wikipedia. https://en.wikipedia.org/wiki/Banach_space#Linear_operators,_isomorphisms

Wikipedia contributors. (2024, July 26). List of Banach spaces. Wikipedia. https://en.wikipedia.org/wiki/List_of_Banach_spaces

This post is licensed under CC BY 4.0 by the author.