Differential Geometry Part 3: Connections, Covariant Derivatives, and Curvature
Understanding how vector fields change (connections, covariant derivatives), how vectors are transported (parallel transport), and how manifolds bend (curvature).
Introduction
In the previous parts, we established smooth manifolds as our geometric spaces (Part 1) and endowed them with Riemannian metrics to measure lengths and angles (Part 2). Now, we need tools to understand how geometric objects, particularly vector fields, change as we move across the manifold.
- How do we differentiate a vector field in a way that is intrinsic to the manifold, not dependent on a specific embedding in a higher-dimensional Euclidean space?
- How can we compare tangent vectors at different points? This leads to the idea of parallel transport.
- How do we quantify the “bending” or curvature of a manifold?
These concepts are captured by connections, covariant derivatives, and the Riemann curvature tensor. They are vital for a deeper understanding of optimization on manifolds, as curvature, for instance, directly impacts the behavior of geodesics and the complexity of loss landscapes.
1. The Need for a Covariant Derivative
Consider a vector field \(Y\) on a manifold \(M\) and another vector field \(X\) (or a curve \(\gamma(t)\) with tangent \(X = \gamma'(t)\)). We want to define the derivative of \(Y\) in the “direction” of \(X\), denoted \(\nabla_X Y\). In \(\mathbb{R}^n\), if \(Y(x) = (Y^1(x), \dots, Y^n(x))\) and \(X_p = (X^1, \dots, X^n)\), the directional derivative is
\[(\nabla_X Y)(p) = \lim_{h \to 0} \frac{Y(p+hX_p) - Y(p)}{h} = \sum_j X^j \frac{\partial Y}{\partial x^j}(p)\]Each component \((\nabla_X Y)^i = X(Y^i) = \sum_j X^j \frac{\partial Y^i}{\partial x^j}\). This simple component-wise differentiation doesn’t work directly on a general manifold because:
- \(Y(p+hX_p)\) is not well-defined: there’s no canonical “addition” on a manifold.
- Even if we use charts, \(Y(q) - Y(p)\) is not meaningful as \(T_q M\) and \(T_p M\) are different vector spaces.
- The basis vectors \(\partial/\partial x^i\) themselves change from point to point if the coordinates are curvilinear. Taking partial derivatives of components \(Y^j\) of \(Y = Y^j \partial_j\) does not capture this change in basis vectors.
We need a derivative operator that produces a tangent vector and behaves like a derivative. This is an affine connection.
Definition. Affine Connection (Covariant Derivative)
An affine connection \(\nabla\) on a smooth manifold \(M\) is an operator
\[\nabla: \mathfrak{X}(M) \times \mathfrak{X}(M) \to \mathfrak{X}(M), \quad (X, Y) \mapsto \nabla_X Y\](where \(\mathfrak{X}(M)\) is the space of smooth vector fields on \(M\)) satisfying:
- \(C^\infty(M)\)-linearity in \(X\): \(\nabla_{fX_1 + gX_2} Y = f \nabla_{X_1} Y + g \nabla_{X_2} Y\) for \(f,g \in C^\infty(M)\).
- \(\mathbb{R}\)-linearity in \(Y\): \(\nabla_X (aY_1 + bY_2) = a \nabla_X Y_1 + b \nabla_X Y_2\) for \(a,b \in \mathbb{R}\).
- Leibniz rule (product rule) in \(Y\): \(\nabla_X (fY) = (Xf)Y + f \nabla_X Y\) for \(f \in C^\infty(M)\).
The vector field \(\nabla_X Y\) is called the covariant derivative of \(Y\) with respect to \(X\). If \(X_p \in T_pM\), then \((\nabla_X Y)_p\) depends only on \(X_p\) and the values of \(Y\) in a neighborhood of \(p\). So we can also define \(\nabla_v Y\) for \(v \in T_p M\).
In local coordinates \((x^1, \dots, x^n)\) with basis vector fields \(\partial_i = \partial/\partial x^i\), the connection is determined by how it acts on these basis fields:
\[\nabla_{\partial_i} \partial_j = \sum_{k=1}^n \Gamma^k_{ij} \partial_k\]The \(n^3\) functions \(\Gamma^k_{ij}\) are called the Christoffel symbols (or connection coefficients) of \(\nabla\). These are the same symbols that appeared in the geodesic equation in Part 2 if we use a specific connection. With these, the covariant derivative of \(Y = Y^j \partial_j\) along \(X = X^i \partial_i\) has components:
\[(\nabla_X Y)^k = X(Y^k) + \sum_{i,j=1}^n X^i Y^j \Gamma^k_{ij} = \sum_{i=1}^n X^i \left( \frac{\partial Y^k}{\partial x^i} + \sum_{j=1}^n Y^j \Gamma^k_{ij} \right)\]The term \(\sum Y^j \Gamma^k_{ij}\) corrects for the change in the coordinate basis vectors.
2. The Levi-Civita Connection
A general manifold can have many affine connections. If \((M,g)\) is a Riemannian manifold, there is a unique connection that is “compatible” with the metric and “symmetric”: the Levi-Civita connection.
Theorem. Fundamental Theorem of Riemannian Geometry
On any Riemannian manifold \((M,g)\), there exists a unique affine connection \(\nabla\) (called the Levi-Civita connection or Riemannian connection) satisfying:
Metric compatibility: \(\nabla\) preserves the metric. That is, for any vector fields \(X, Y, Z\):
\[X(g(Y,Z)) = g(\nabla_X Y, Z) + g(Y, \nabla_X Z)\](This means the metric is “covariantly constant”: \(\nabla g = 0\)).
Torsion-free (Symmetry): For any vector fields \(X, Y\):
\[\nabla_X Y - \nabla_Y X = [X,Y]\]where \([X,Y]\) is the Lie bracket of vector fields. In local coordinates, this is equivalent to \(\Gamma^k_{ij} = \Gamma^k_{ji}\) (symmetry of Christoffel symbols in lower indices).
The Christoffel symbols for the Levi-Civita connection are precisely those given in Part 2, derived from the metric \(g_{ij}\):
\[\Gamma^k_{ij} = \frac{1}{2} \sum_{l=1}^n g^{kl} \left( \frac{\partial g_{jl}}{\partial x^i} + \frac{\partial g_{il}}{\partial x^j} - \frac{\partial g_{ij}}{\partial x^l} \right)\]From now on, unless stated otherwise, \(\nabla\) will refer to the Levi-Civita connection on a Riemannian manifold.
3. Parallel Transport
The covariant derivative allows us to define what it means for a vector field to be “constant” along a curve. Let \(\gamma: I \to M\) be a smooth curve, and let \(V(t)\) be a vector field along \(\gamma\) (i.e., \(V(t) \in T_{\gamma(t)}M\) for each \(t \in I\)). The covariant derivative of \(V\) along \(\gamma\) is denoted \(\frac{DV}{dt}\) or \(\nabla_{\gamma'(t)} V\).
Definition. Parallel Transport
A vector field \(V(t)\) along a curve \(\gamma(t)\) is said to be parallel transported along \(\gamma\) if its covariant derivative along \(\gamma\) is zero:
\[\frac{DV}{dt}(t) = \nabla_{\gamma'(t)} V(t) = 0 \quad \text{for all } t \in I\]
Given a vector \(v_0 \in T_{\gamma(t_0)}M\) at a point \(\gamma(t_0)\) on the curve, there exists a unique parallel vector field \(V(t)\) along \(\gamma\) such that \(V(t_0) = v_0\). This process defines a linear isomorphism, called parallel transport map \(P_{\gamma, t_0, t_1}: T_{\gamma(t_0)}M \to T_{\gamma(t_1)}M\), by \(P_{\gamma, t_0, t_1}(v_0) = V(t_1)\). If the connection is metric-compatible (like Levi-Civita), parallel transport preserves inner products, lengths, and angles: \(g(V(t), W(t)) = \text{const}\) if \(V,W\) are parallel along \(\gamma\).
Holonomy. If you parallel transport a vector around a closed loop \(\gamma\), it may not return to its original orientation. The difference measures the holonomy of the connection, which is related to curvature. On a flat space like \(\mathbb{R}^n\) with the standard connection, a vector always returns to itself. On a sphere, parallel transporting a vector around a latitude (other than the equator) will result in a rotated vector.
Geodesics Revisited: A curve \(\gamma(t)\) is a geodesic if and only if its tangent vector \(\gamma'(t)\) is parallel transported along itself: \(\nabla_{\gamma'(t)} \gamma'(t) = 0\). This is exactly the geodesic equation we saw earlier.
4. Curvature: Measuring the “Bending” of a Manifold
Curvature quantifies how much the geometry of a Riemannian manifold deviates from being Euclidean (“flat”). A key way it manifests is that the result of parallel transporting a vector between two points can depend on the path taken.
The Riemann curvature tensor (or Riemann tensor) \(R\) measures the non-commutativity of covariant derivatives, or equivalently, the failure of second covariant derivatives to commute. For vector fields \(X, Y, Z\), the Riemann tensor is defined as:
\[R(X,Y)Z = \nabla_X \nabla_Y Z - \nabla_Y \nabla_X Z - \nabla_{[X,Y]} Z\]It’s a \((1,3)\)-tensor, meaning it takes three vector fields and produces one vector field. Its components in a local coordinate system \((\partial_i = \partial/\partial x^i)\) are \(R^l_{ijk}\), where
\[R(\partial_i, \partial_j)\partial_k = \sum_l R^l_{ijk} \partial_l\]Explicitly:
\[R^l_{ijk} = \frac{\partial \Gamma^l_{kj}}{\partial x^i} - \frac{\partial \Gamma^l_{ki}}{\partial x^j} + \sum_m (\Gamma^m_{kj} \Gamma^l_{im} - \Gamma^m_{ki} \Gamma^l_{jm})\]A manifold is flat (locally isometric to Euclidean space) if and only if its Riemann curvature tensor is identically zero.
Symmetries of the Riemann Tensor: The Riemann tensor (in its \((0,4)\) form, \(R_{lijk} = g_{lm}R^m_{ijk}\)) has several symmetries:
- \[R_{lijk} = -R_{ljik}\]
- \(R_{lijk} = -R_{klij}\) (from \(R(X,Y)Z = -R(Y,X)Z\) and others)
- First Bianchi Identity: \(R(X,Y)Z + R(Y,Z)X + R(Z,X)Y = 0\) (or \(R_{lijk} + R_{ljki} + R_{lkij} = 0\))
These symmetries reduce the number of independent components. For an \(n\)-manifold, there are \(n^2(n^2-1)/12\) independent components.
- For \(n=2\) (surfaces), there is 1 independent component.
- For \(n=3\), there are 6 independent components.
- For \(n=4\), there are 20 independent components.
Sectional Curvature
For a 2D plane \(\sigma \subset T_p M\) spanned by orthonormal vectors \(u, v\), the sectional curvature \(K(\sigma)\) or \(K(u,v)\) is given by:
\[K(u,v) = g(R(u,v)v, u)\]This measures the Gaussian curvature of the 2D surface formed by geodesics starting at \(p\) in directions within \(\sigma\). If all sectional curvatures are constant \(c\), the manifold is a space of constant curvature.
- \(c > 0\): e.g., sphere (locally).
- \(c = 0\): e.g., Euclidean space.
- \(c < 0\): e.g., hyperbolic space (locally).
Ricci Curvature and Scalar Curvature
By contracting the Riemann tensor, we get simpler curvature measures:
Ricci Tensor (\((0,2)\)-tensor):
\(\text{Ric}(X,Y) = \sum_i g(R(E_i, X)Y, E_i) \quad \text{(trace over first and third index of } R^l_{ijk})\) In coordinates: \(R_{jk} = \sum_i R^i_{jik}\). The Ricci tensor measures how the volume of a small geodesic ball deviates from that of a Euclidean ball. It plays a key role in Einstein’s theory of general relativity.
\ast \ast Scalar Curvature (\(0\)-tensor, i.e., a scalar function):\ast \ast $$
S = \text{tr}_g(\text{Ric}) = \sum_i \text{Ric}(E_i, E_i)
\(In coordinates:\)S = \sum_j g^{jk} R_{jk}\(. It's the "total" curvature at a point. For surfaces (\)n=2\(),\)S = 2K\(, where\)K$$ is the Gaussian curvature.
\ast \ast Curvature in Machine Learning\ast \ast
Curvature of the parameter manifold (or loss landscape) has significant implications for optimization:
- \ast \ast Positive curvature (like a bowl):\ast \ast Often associated with well-behaved minima. Gradients point consistently towards the minimum.
- \ast \ast Negative curvature (like a saddle):\ast \ast Characterizes saddle points, which can slow down first-order optimization methods. Second-order methods (using Hessian information) can exploit negative curvature to escape saddles. The Hessian of the loss function can be related to the Ricci curvature of the parameter manifold under certain conditions (e.g., with FIM).
- \ast \ast Flat regions (zero curvature):\ast \ast Can lead to plateaus where gradients are very small, slowing convergence.
- The \ast \ast Riemann curvature tensor\ast \ast of a statistical manifold equipped with the FIM provides detailed information about the interactions between parameters and the local geometry. These concepts are explored in Information Geometry. For example, Amari’s \(\alpha\)-connections and \(\alpha\)-curvatures generalize these notions.
5. Conclusion of the Crash Course
This three-part crash course has taken us on a journey from the basic idea of smooth manifolds to the intricate concept of curvature:
- \ast \ast Part 1:\ast \ast Introduced \ast \ast smooth manifolds\ast \ast as the spaces where our parameters or data might live, and \ast \ast tangent spaces\ast \ast as the realms of local directions and derivatives.
- \ast \ast Part 2:\ast \ast Equipped manifolds with \ast \ast Riemannian metrics\ast \ast , allowing us to measure distances, angles, and volumes, and to define \ast \ast geodesics\ast \ast as the “straightest” paths.
- \ast \ast Part 3:\ast \ast Developed tools for differentiation on manifolds (\ast \ast connections\ast \ast , \ast \ast covariant derivatives\ast \ast ), understood how to move vectors consistently (\ast \ast parallel transport\ast \ast ), and quantified how manifolds bend (\ast \ast curvature tensors\ast \ast ).
Differential geometry provides a powerful language and a rich set of tools to analyze complex systems. In machine learning, it helps us understand:
- The structure of high-dimensional parameter spaces.
- The behavior of optimization algorithms on non-Euclidean loss landscapes.
- The intrinsic properties of families of statistical models.
While this crash course has only scratched the surface, hopefully, it has provided you with the core intuitions and definitions to appreciate the geometric perspectives increasingly found in machine learning research. From here, you are better equipped to explore more advanced topics, such as:
- \ast \ast Information Geometry:\ast \ast The specific application of DG to manifolds of probability distributions, with the Fisher Information Metric playing a central role. (This will be a follow-up crash course!)
- Lie groups and symmetric spaces in ML.
- Geometric deep learning (applying DG/topology to graph-structured data, etc.).
- Advanced optimization methods on manifolds.
Thank you for joining this crash course! We encourage you to continue exploring the beautiful interplay between geometry and machine learning.