Tensor Calculus Part 1: From Vectors to Tensors – Multilinear Algebra
Introduction to tensors, their importance in ML, Einstein summation convention, and fundamental tensor algebraic operations like outer product and contraction.
Welcome to the first part of our crash course on Tensor Calculus! Our goal here is to build a solid foundation, starting from familiar concepts like vectors and gradually introducing the more general framework of tensors. We’ll focus on definitions, notation, and basic algebraic operations that are indispensable for understanding advanced topics in machine learning and optimization.
1. Introduction: Why Tensors? The ML Motivation
You’ve likely encountered “tensors” in machine learning libraries like TensorFlow or PyTorch, often referring to multi-dimensional arrays. While this is a practical starting point, the mathematical concept of a tensor is richer and more fundamental. Tensors are geometric objects whose components transform in very specific ways when you change your coordinate system. This transformation property is key – it ensures that physical laws and geometric relationships described by tensors remain consistent, regardless of the chosen observational frame.
I mentioned in the linear algebra course that you can indeed multiply vectors, but at the cost of leaving the realm of linearity. Tensors fundamentally rely on this fact through the tensor product. As we will see, just like real vectors can be thought of as physical geometric objects that live independently of our coordinate system, thus “grid of numbers”, tensors are a product of vectors and covectors and thus similarly invariant under coordinate transformations; however, how their components in that coordinate system will be of interest to study.
Why is this important for Machine Learning?
- Handling High-Dimensional Data Naturally: Gradients of scalar loss functions with respect to matrix or higher-order weights (e.g.,
, where
could be the weights of a convolutional layer) are inherently tensorial. Understanding their structure helps in designing and analyzing optimizers.
- Describing Geometric Properties: The Hessian matrix, which describes the local curvature of the loss surface, is a (0,2)-tensor. Its properties are crucial for second-order optimization methods. More complex geometric features, especially in the context of Information Geometry (e.g., the Fisher Information Matrix), are also tensors.
- Representing Multi-Linear Relationships: Some advanced neural network architectures or operations, like certain forms of attention mechanisms or bilinear pooling, involve interactions between multiple vector spaces. Tensors provide the natural language for describing such multi-linear maps.
In essence, tensors provide a robust mathematical framework for dealing with quantities that have magnitude, direction, and potentially multiple “orientations” in space, especially when those quantities need to behave consistently under transformations.
2. Revisiting Vectors and Covectors (Dual Vectors)
Before diving into general tensors, let’s solidify our understanding of vectors and introduce their close relatives, covectors.
Consider an
-dimensional real vector space
. A vector
can be expressed as a linear combination of basis vectors
:
The quantities
are the components of the vector
in this basis. We adopt the convention of writing these components with an upper index. As we’ll see in Part 2, these components transform in a specific way (contravariantly) under coordinate changes.
Associated with every vector space
is its dual vector space
. The elements of
are called covectors (or dual vectors, linear functionals, or 1-forms). A covector
is a linear map from
to the field of real numbers
:
Given a basis
for
, we can define a dual basis
for
such that:
Here,
is the Kronecker delta, which is 1 if
and 0 if
. A covector
can then be written as a linear combination of dual basis covectors:
The quantities
are the components of the covector
in this dual basis. We write these components with a lower index. These components transform covariantly (see Part 2).
The action of a covector
on a vector
is a scalar, obtained by:
This sum
is a fundamental operation and motivates the Einstein summation convention, which we’ll introduce shortly. The distinction between vectors (upper-indexed components) and covectors (lower-indexed components) is crucial in tensor calculus, especially when dealing with non-Euclidean geometries or curvilinear coordinates.
3. Defining Tensors
With vectors and covectors in mind, we can now define tensors more generally.
Definition. Tensor of Type
\[ (p,q) \]A **tensor
\[ T \]of type (or rank)
\[ (p,q) \]** over a vector space
\[ V \]is a multilinear map that takes
\[ q \]covectors from the dual space
\[ V^\ast \]and
\[ p \]vectors from the vector space
\[ V \]as arguments, and returns a scalar from
\[ \mathbb{R} \].
\[ T: \underbrace{V^\ast \times \dots \times V^\ast}_{q \text{ times}} \times \underbrace{V \times \dots \times V}_{p \text{ times}} \to \mathbb{R} \]The integer
\[ p \]is called the contravariant rank and
\[ q \]is called the covariant rank. The total rank is
\[ p+q \].
In a chosen basis
\[ \{\mathbf{e}_i\} \]for
\[ V \]and its dual basis
\[ \{\boldsymbol{\epsilon}^j\} \]for
\[ V^\ast \], the tensor
\[ T \]can be represented by a set of
\[ n^{p+q} \]components, typically written as:
\[ T^{i_1 i_2 \dots i_p}_{j_1 j_2 \dots j_q} \]where each index
\[ i_k \]ranges from
\[ 1 \]to
\[ n \], and each index
\[ j_l \]ranges from
\[ 1 \]to
\[ n \]. The upper indices correspond to the contravariant part (related to vectors) and the lower indices to the covariant part (related to covectors).
Let’s look at some examples:
- A scalar
is a (0,0)-tensor. It takes no vector or covector arguments and is just a number. Its component is itself (no indices).
- A contravariant vector
(with components
) is a (1,0)-tensor. It can be thought of as a map
, taking one covector to a scalar.
- A covariant vector (covector)
(with components
) is a (0,1)-tensor. It maps one vector to a scalar:
.
- A linear transformation (matrix)
mapping vectors from
to
can be represented as a (1,1)-tensor with components
. It can take one covector
and one vector
to produce a scalar:
.
- A bilinear form
(e.g., an inner product) that takes two vectors
and produces a scalar
is a (0,2)-tensor with components
. The metric tensor, which we’ll meet later, is of this type.
4. Einstein Summation Convention
Writing out sums like
can become cumbersome with multiple indices. The Einstein summation convention simplifies this notation significantly.
Rule: If an index variable appears twice in a single term, once as an upper (contravariant) index and once as a lower (covariant) index, summation over all possible values of that index (typically from 1 to
, the dimension of the space) is implied.
Warning. Einstein Notation: Components vs. Abstract Tensors
Einstein summation notation primarily deals with the components of tensors (e.g.,
\[ v^i \],
\[ A^i_j \]). This means that equations written using this notation are relationships between these numerical components and are therefore implicitly dependent on the chosen basis.
The “true formula” or abstract representation of a tensor equation (e.g.,
\[ \mathbf{y} = \mathbf{A}(\mathbf{x}) \]) or an explicit basis representation (e.g.,
\[ \mathbf{v} = v^i \mathbf{e}_i \]) is inherently basis-independent or makes the basis dependence explicit. In contrast, component equations like
\[ y^i = A^i_j x^j \]are compact but “hide” the basis vectors. This distinction is crucial: the components
\[ v^i \]will change if the basis
\[ \{\mathbf{e}_i\} \]changes, even if the vector
\[ \mathbf{v} \]itself does not. We will explore how components transform in Part 2.
- Such a repeated index is called a dummy index or summation index.
- An index that appears only once in a term is called a free index. Free indices must match on both sides of an equation.
Examples:
- The action of a covector on a vector:
means
.
- A vector in terms of its components and basis vectors:
.
- A covector in terms of its components and dual basis covectors:
.
- Matrix-vector multiplication
means
for each
. Here,
is the dummy index, and
is the free index.
- Matrix multiplication
means
. Here,
is the dummy index, while
and
are free indices.
- The trace of a matrix
is
.
Tip. Mastering Einstein Notation
This notation is fundamental to working efficiently with tensors.
- Identify dummy indices: Look for pairs of identical indices, one up, one down, in the same term. These are summed over.
- Identify free indices: Indices appearing once in a term. These must be the same on both sides of an equation.
- Relabeling dummy indices: The letter used for a dummy index doesn’t matter and can be changed, e.g.,
\[ A^i_k B^k_j = A^i_l B^l_j \]. Avoid reusing a dummy index letter as a free index in the same term.
- No summation for same-level repeated indices (usually): Expressions like
\[ v^i w^i \]or
\[ u_k v_k \]do not imply summation under the standard Einstein convention unless a metric tensor is explicitly used to form a scalar product (e.g.,
\[ g_{ik} v^i w^k \]). We will clarify this when we discuss the metric tensor. For now, summation is only for one upper and one lower index.
5. Special Tensors: Kronecker Delta
The Kronecker delta is an essential (1,1)-tensor with components:
Its matrix representation is the identity matrix. The Kronecker delta acts as a “substitution operator”:
(sum over
, the only non-zero term is when
)
(sum over
)
In Part 2, we will confirm that it indeed transforms as a (1,1)-tensor.
6. Tensor Algebra
Tensors of the same type can be combined using algebraic operations similar to those for vectors and matrices.
- Addition and Subtraction: Two tensors
and
can be added or subtracted if and only if they have the same type
. The resulting tensor
also has type
, and its components are the sum or difference of the corresponding components:
1
2
<div class="math-block" markdown="0"> \[ (C)^{i_1 \dots i_p}_{j_1 \dots j_q} = (A)^{i_1 \dots i_p}_{j_1 \dots j_q} \pm (B)^{i_1 \dots i_p}_{j_1 \dots j_q} \]
</div>
- Scalar Multiplication: Multiplying a tensor
by a scalar
results in a tensor
of the same type, whose components are:
1
2
<div class="math-block" markdown="0"> \[ (\alpha T)^{i_1 \dots i_p}_{j_1 \dots j_q} = \alpha (T^{i_1 \dots i_p}_{j_1 \dots j_q}) \]
</div>
- Outer Product (Tensor Product): The outer product of a tensor
of type
with components
and a tensor
of type
with components
is a new tensor
of type
. Its components are formed by simply multiplying the components of
and
:
1
2
3
4
5
6
<div class="math-block" markdown="0"> \[ (C)^{i_1 \dots i_p k_1 \dots k_r}_{j_1 \dots j_q l_1 \dots l_s} = A^{i_1 \dots i_p}_{j_1 \dots j_q} B^{k_1 \dots k_r}_{l_1 \dots l_s} \]
</div>
No indices are summed in the outer product.
Example: Outer product of two vectors
(type (1,0)) and
(type (1,0)) yields a (2,0)-tensor
. Example: Outer product of a vector
(type (1,0)) and a covector
(type (0,1)) yields a (1,1)-tensor
.
- Contraction: Contraction is an operation that reduces the rank of a tensor. It involves selecting one upper (contravariant) index and one lower (covariant) index in a single tensor, setting them equal, and summing over that index (Einstein summation). This reduces the contravariant rank by one and the covariant rank by one. Given a tensor
, if we contract, say, the
-th upper index
with the
-th lower index
, we set
(a dummy index) and sum over
. The resulting tensor will be of type
.
1
2
3
4
5
<details class="details-block" markdown="1">
<summary markdown="1">
**Example.** Trace as a contraction
</summary>
Consider a (1,1)-tensor
(like a matrix). If we contract its upper index with its lower index, we set
(dummy index) and sum:
1
2
3
4
5
<div class="math-block" markdown="0"> \[ S = A^k_k = \sum_k A^k_k \]
</div>
The result
is a (0,0)-tensor, which is a scalar. This is precisely the definition of the trace of a matrix. </details>
1
Another example: Given a (2,1)-tensor
. We can contract
with
: Set
(dummy index). The result is
. This is a (1,0)-tensor (a contravariant vector).
- Inner Product: The term “inner product” in tensor context often refers to an outer product followed by one or more contractions. For example, the standard matrix multiplication
can be seen as: 1. Forming an outer product of
and
to get a temporary (2,2)-tensor
. (Note: I’ve used different letters for clarity before contraction). 2. Contracting the index
with
(by setting
and summing):
.
1
The scalar product of a covector
and a vector
is
. This is a contraction of their outer product
over the indices
and
(by setting
).
7. Symmetry and Anti-Symmetry
Tensors can exhibit symmetry properties with respect to their indices.
- A tensor is symmetric in two of its indices if its components remain unchanged when those two indices are swapped. The indices must be of the same type (both contravariant or both covariant).
- For a (0,2)-tensor
: symmetric if
for all
. * For a (2,0)-tensor
: symmetric if
for all
.
- A tensor is anti-symmetric (or skew-symmetric) in two of its indices if its components change sign when those two indices are swapped.
- For a (0,2)-tensor
: anti-symmetric if
for all
. This implies that if
, then
, so
(no sum implied here, diagonal components are zero). * For a (2,0)-tensor
: anti-symmetric if
for all
.
Example from Machine Learning: The Hessian matrix of a scalar loss function
, with components
, is a (0,2)-tensor. If the second partial derivatives of
are continuous, then by Clairaut’s Theorem (equality of mixed partials),
. Thus, the Hessian is a symmetric (0,2)-tensor. This symmetry is important for its spectral properties and its role in optimization.