Cheat Sheet: Linear Algebra
A quick reference guide compiling essential theorems, identities, facts, and formulas from linear algebra, designed for the crash course on mathematical foundations for machine learning and optimization.
Linear Algebra: Key Concepts and Formulas
This quick reference guide covers essential theorems, identities, and facts from linear algebra, particularly relevant for machine learning and optimization.
1. Vectors
Definitions & Operations
-
Vector: An element of a vector space, often represented as an array of numbers (components). Geometrically, a quantity with magnitude and direction.
\[ \mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \in \mathbb{R}^n \] -
Vector Addition:
\[ \mathbf{u} + \mathbf{v} = \begin{bmatrix} u_1+v_1 \\ \vdots \\ u_n+v_n \end{bmatrix} \]- Properties: Commutative (
), Associative (
)
-
Scalar Multiplication:
\[ c\mathbf{v} = \begin{bmatrix} cv_1 \\ \vdots \\ cv_n \end{bmatrix} \]- Properties: Distributive over vector/scalar addition, Associative
-
**Dot Product (Standard Inner Product for
):**
- Properties:
- Commutative:
1
2. Distributive:
1
3. Bilinear:
1
4. Positive-definiteness:
, and
-
Angle between two non-zero vectors:
\[ \cos \theta = \frac{\mathbf{u} \cdot \mathbf{v}}{\Vert \mathbf{u} \Vert_2 \Vert \mathbf{v} \Vert_2} \] -
Orthogonal Vectors:
(implies
or
if vectors are non-zero).
- Vector Projection: The projection of vector
onto non-zero vector
is:
The scalar component of
along
is
.
Vector Norms
- Definition: A function
such that for all
,
:
(Non-negativity)
(Definiteness)
(Absolute homogeneity)
(Triangle Inequality)
- **Common
Norms:** For
norm (Manhattan norm):
norm (Euclidean norm):
norm (
):
norm (Max norm):
-
Cauchy-Schwarz Inequality:
\[ \vert \mathbf{u} \cdot \mathbf{v} \vert \le \Vert \mathbf{u} \Vert_2 \Vert \mathbf{v} \Vert_2 \]More generally for inner products:
. Equality holds if and only if
and
are linearly dependent.
2. Matrices
Definitions & Special Matrices
- Matrix: A rectangular array of numbers,
.
| Type | Definition / Condition | Key Notes | | ————————— | ——————————————————————– | —————————————————————————————————————– | | Identity (
or
) | Square matrix;
,
for
1
|
1
| | Zero (
) | All entries
1
| | | Diagonal |
for
1
| | | Symmetric |
(square matrix) | Real eigenvalues, orthogonally diagonalizable | | Skew-Symmetric |
(square matrix) | Diagonal elements
. Eigenvalues are 0 or purely imaginary. | | Upper Triangular |
for
(square matrix) | Eigenvalues are the diagonal entries | | Lower Triangular |
for
(square matrix) | Eigenvalues are the diagonal entries | | Orthogonal (
) | Square matrix;
1
|
, columns/rows form an orthonormal basis,
, preserves dot products &
norms | | Normal | Square matrix;
(real) or
(complex) | Unitarily diagonalizable. Includes symmetric, skew-symmetric, orthogonal matrices. |
Matrix Operations
- Matrix Addition/Subtraction: Element-wise. Matrices must have the same dimensions.
- Scalar Multiplication: Multiply each element by the scalar.
- Matrix Multiplication: If
and
, then
, where
- Properties:
- Associative:
1
2. Distributive:
and
1
3. **Not Commutative** in general:
- **Transpose (
):** Rows become columns (and vice versa).
.
- Properties:
1
2.
1
3.
1
4.
(Reverse order)
- **Matrix Inverse (
):** For a square matrix
, if there exists
such that
.
is invertible (or non-singular) if
exists. This is true if and only if
.
- Properties:
1
2.
(if A, B are invertible, reverse order) 3.
- **Trace (
)**: Sum of diagonal elements of a square matrix
.
- Properties:
1
2.
1
3.
(Cyclic property. Valid if
and
are square, e.g.,
). 4.
1
5.
- **Hadamard Product (Element-wise product,
):**
. Matrices must have the same dimensions.
- **Moore-Penrose Pseudoinverse (
):** For any matrix
, the pseudoinverse
is the unique matrix satisfying:
(so
is symmetric)
(so
is symmetric)
- If
has full column rank (
,
), then
. In this case,
.
- If
has full row rank (
,
), then
. In this case,
.
- Computation via SVD: If
is the SVD of
, then
, where
is formed by taking the reciprocal of the non-zero singular values in
and transposing the resulting matrix.
- Use: The vector
is the minimum norm (
) least-squares solution to
.
Matrix Norms
-
Operator Norm (Induced Norm):
\[ \Vert A \Vert_p = \sup_{\mathbf{x} \ne \mathbf{0}} \frac{\Vert A\mathbf{x} \Vert_p}{\Vert \mathbf{x} \Vert_p} = \sup_{\Vert \mathbf{x} \Vert_p = 1} \Vert A\mathbf{x} \Vert_p \]- Spectral Norm (
):
(largest singular value of
)
1
2
3
4
5
<div class="math-block" markdown="0"> \[ \Vert A \Vert_2 = \sqrt{\lambda_{\max}(A^T A)} \]
</div>
where
is the largest eigenvalue of
.
- General property:
(submultiplicativity for induced norms).
-
Frobenius Norm:
\[ \Vert A \Vert_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n \vert A_{ij} \vert^2} = \sqrt{\text{tr}(A^T A)} \]- Properties:
- Properties:
(sum of squares of singular values of
) 2. Submultiplicative:
1
3.
3. Linear Systems, Vector Spaces & Subspaces
Linear Transformations
Definition. Linear Transformation
A transformation (or mapping)
\[ T: V \to W \]from a vector space
\[ V \]to a vector space
\[ W \]is linear if for all vectors
\[ \mathbf{u}, \mathbf{v} \in V \]and all scalars
\[ c \]:
\[ T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}) \](Additivity)
\[ T(c\mathbf{u}) = cT(\mathbf{u}) \](Homogeneity)
Any linear transformation
\[ T: \mathbb{R}^n \to \mathbb{R}^m \]can be represented by matrix multiplication
\[ T(\mathbf{x}) = A\mathbf{x} \]for a unique matrix
\[ A \in \mathbb{R}^{m \times n} \]. The columns of
\[ A \]are the images of the standard basis vectors of
\[ \mathbb{R}^n \]under
\[ T \].
Systems of Linear Equations
- A system of linear equations can be written as
, where
is the coefficient matrix,
is the vector of unknowns, and
is the constant vector.
- A solution exists if and only if
(the column space of
). Equivalently,
.
- If a solution exists:
- It is unique if and only if
(i.e., columns of
are linearly independent). This implies
and
.
- If
is square (
) and invertible (
), there is a unique solution
.
- If there are free variables (i.e.,
), there are infinitely many solutions. The general solution is
, where
is a particular solution and
is any solution to the homogeneous equation
.
Vector Space Axioms
Definition. Vector Space
A set
equipped with two operations, vector addition (
) and scalar multiplication (
), is a vector space over a field
(typically
or
) if it satisfies the following axioms for all vectors
and scalars
:
(Closure under addition)
(Commutativity of addition)
(Associativity of addition)
- There exists a zero vector
such that
for all
(Additive identity)
- For every
, there exists an additive inverse
such that
(Closure under scalar multiplication)
(Distributivity of scalar multiplication over vector addition)
(Distributivity of scalar multiplication over field addition)
(Associativity of scalar multiplication)
(Scalar multiplicative identity, where
is the multiplicative identity in
)
Key Concepts for Vector Spaces
- Linear Combination: A vector
where
are scalars.
- Span: The set of all possible linear combinations of a set of vectors
. Denoted
.
- Linear Independence: A set of vectors
is linearly independent if the only solution to
is
.
- Basis: A linearly independent set of vectors that spans the entire vector space. All bases for a given vector space have the same number of vectors.
- Dimension: The number of vectors in any basis for a vector space. Denoted
.
- Subspace: A subset of a vector space that is itself a vector space under the same operations (i.e., it is closed under vector addition and scalar multiplication, and contains the zero vector).
Fundamental Subspaces of a Matrix
| Subspace Name | Definition | Space | Dimension | Orthogonal Complement | | ————————————– | —————————————————————————————- | —————- | —————————————— | ————————————– | | Column Space (Image, Range) |
1
|
|
1
| Left Null Space (
) | | Null Space (Kernel) |
1
|
|
| Row Space (
) | | Row Space (
) |
1
|
|
1
| Null Space (
) | | Left Null Space (
) |
|
|
1
| Column Space (
) |
- Properties of Rank: For matrices
and
(dimensions allowing products/sums):
Theorem. Rank-Nullity Theorem
For any matrix
\[ A \in \mathbb{R}^{m \times n} \]:
\[ \text{rank}(A) + \text{nullity}(A) = n \quad (\text{number of columns of } A) \]
Orthogonal Complements
- Two subspaces
and
of
are orthogonal complements if every vector in
is orthogonal to every vector in
, and
(their direct sum spans
and their intersection is
). Denoted
.
- The table above shows the orthogonal complement relationships for the fundamental subspaces. For example,
in
, and
in
. This implies
and
.
4. Determinants
- Definition: A scalar value associated with a square matrix
, denoted
or
.
- For
,
,
.
- For
,
,
.
- For
, often defined recursively using cofactor expansion along any row
(or column
):
1
2
3
4
5
<div class="math-block" markdown="0"> \[ \det(A) = \sum_{j=1}^n (-1)^{i+j} A_{ij} M_{ij} \quad (\text{expansion along row } i) \]
</div>
where
is the determinant of the submatrix obtained by deleting row
and column
(the
-minor).
- Properties:
.
- If
has a row or column of zeros,
.
- If two rows or two columns of
are identical,
.
- If
is obtained from
by swapping two rows (or two columns), then
.
- If
is obtained from
by multiplying a single row (or column) by a scalar
, then
.
- Consequently, for
,
.
- If
is obtained from
by adding a multiple of one row to another row (or column to column), then
.
for square matrices
.
.
is invertible (non-singular) if and only if
.
- If
is invertible,
.
- For a triangular matrix (upper or lower),
is the product of its diagonal entries.
- Geometric Interpretation: For a matrix
,
is the factor by which the linear transformation represented by
scales
-dimensional volume. The sign of
indicates whether the transformation preserves or reverses orientation.
Adjoint Matrix and Cramer’s Rule
- Adjoint (or Adjugate) Matrix: The adjoint of
, denoted
or
, is the transpose of the cofactor matrix of
. That is,
.
- Property:
.
- If
, then
.
- Cramer’s Rule: For an invertible matrix
, the unique solution to
is given by:
where
is the matrix formed by replacing the
-th column of
with the vector
. Cramer’s rule is computationally inefficient for large systems but is theoretically important.
5. Eigenvalues and Eigenvectors
- Definition: For a square matrix
, a non-zero vector
is an eigenvector of
if
for some scalar
. The scalar
is the corresponding eigenvalue.
- Characteristic Equation: Eigenvalues are the roots of the characteristic polynomial
. This is a polynomial in
of degree
.
- Properties:
- An
matrix
has
eigenvalues in
, counting multiplicities.
- Sum of eigenvalues:
- Product of eigenvalues:
- Eigenvectors corresponding to distinct eigenvalues are linearly independent.
- If
is a triangular matrix, its eigenvalues are its diagonal entries.
and
have the same eigenvalues (but generally different eigenvectors).
- If
is an eigenvalue of an invertible matrix
, then
is an eigenvalue of
. The corresponding eigenvector is the same.
- If
is an eigenvalue of
, then
is an eigenvalue of
for any integer
. The corresponding eigenvector is the same.
- The set of all eigenvectors corresponding to an eigenvalue
, along with the zero vector, forms a subspace called the eigenspace
.
- Cayley-Hamilton Theorem: Every square matrix satisfies its own characteristic equation. If
is the characteristic polynomial of
, then
.
For Symmetric Matrices (
,
)
- All eigenvalues are real numbers.
- Eigenvectors corresponding to distinct eigenvalues are orthogonal.
- A real symmetric matrix is always orthogonally diagonalizable: there exists an orthogonal matrix
whose columns are orthonormal eigenvectors of
, and a diagonal matrix
whose diagonal entries are the corresponding eigenvalues, such that
. This is known as the Spectral Theorem.
6. Matrix Decompositions
These are ways to factorize a matrix into a product of other matrices with useful properties.
Theorem. Spectral Theorem for Real Symmetric Matrices
If
\[ A \in \mathbb{R}^{n \times n} \]is a symmetric matrix (i.e.,
\[ A = A^T \]), then there exists an orthogonal matrix
\[ Q \in \mathbb{R}^{n \times n} \](whose columns are orthonormal eigenvectors of
\[ A \]) and a real diagonal matrix
\[ \Lambda \in \mathbb{R}^{n \times n} \](whose diagonal entries are the eigenvalues of
\[ A \]corresponding to the columns of
\[ Q \]) such that:
\[ A = Q \Lambda Q^T \]This decomposition allows
\[ A \]to be expressed as a sum of rank-1 outer products:
\[ A = \sum_{i=1}^n \lambda_i \mathbf{q}_i \mathbf{q}_i^T \]where
\[ \mathbf{q}_i \]are the orthonormal eigenvectors (columns of
\[ Q \]) and
\[ \lambda_i \]are the corresponding eigenvalues (diagonal entries of
\[ \Lambda \]).
Theorem. Singular Value Decomposition (SVD)
For any matrix
\[ A \in \mathbb{R}^{m \times n} \](not necessarily square or symmetric), there exist orthogonal matrices
\[ U \in \mathbb{R}^{m \times m} \]and
\[ V \in \mathbb{R}^{n \times n} \], and a real diagonal matrix
\[ \Sigma \in \mathbb{R}^{m \times n} \](meaning only
\[ (\Sigma)_{ii} \]can be non-zero) with non-negative real numbers
\[ \sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_r > 0 \]on its “diagonal” (where
\[ r = \text{rank}(A) \], and
\[ \sigma_i = 0 \]for
\[ i > r \]), such that:
\[ A = U \Sigma V^T \]
- The columns of
\[ U \]are left singular vectors (orthonormal eigenvectors of
\[ AA^T \]).
- The columns of
\[ V \]are right singular vectors (orthonormal eigenvectors of
\[ A^T A \]).
- The diagonal entries of
\[ \Sigma \](
\[ \sigma_i \]) are singular values of
\[ A \]. They are the square roots of the non-zero eigenvalues of both
\[ A^T A \]and
\[ AA^T \]:
\[ \sigma_i(A) = \sqrt{\lambda_i(A^T A)} = \sqrt{\lambda_i(AA^T)} \]. SVD provides a way to write
\[ A \]as a sum of rank-1 matrices (Full SVD sum goes up to
\[ \min(m,n) \], but terms with
\[ \sigma_i=0 \]vanish):
\[ A = \sum_{i=1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i^T \]where
\[ \mathbf{u}_i \]is the
\[ i \]-th column of
\[ U \]and
\[ \mathbf{v}_i \]is the
\[ i \]-th column of
\[ V \].
Other Useful Decompositions
- LU Decomposition: For many square matrices
, one can find a factorization
(or
including permutations
for stability/existence), where
is a lower triangular matrix (often with 1s on its diagonal, making it unit lower triangular) and
is an upper triangular matrix. This is commonly used to solve systems
efficiently via forward and backward substitution.
- QR Decomposition: For any
, it can be factored as
, where
is an orthogonal matrix and
is an upper triangular matrix.
- If
(tall or square matrix), a “thin” or “reduced” QR decomposition is often used:
, where
has orthonormal columns and
is upper triangular. The columns of
form an orthonormal basis for
if
has full column rank. This decomposition can be found using Gram-Schmidt process on columns of
.
- Cholesky Decomposition: For a real, symmetric, positive definite matrix
, there exists a unique lower triangular matrix
with strictly positive diagonal entries such that
. (Alternatively,
where
is upper triangular,
). This is often used for solving linear systems with symmetric positive definite coefficient matrices and in statistics (e.g., sampling from multivariate normal distributions).
- **Diagonalization (
):** An
matrix
is diagonalizable if and only if it has
linearly independent eigenvectors. In this case,
, where
is an invertible matrix whose columns are the eigenvectors of
, and
is a diagonal matrix whose diagonal entries are the corresponding eigenvalues.
- A matrix is diagonalizable if and only if for every eigenvalue
, its geometric multiplicity (dimension of the eigenspace
) equals its algebraic multiplicity (multiplicity as a root of the characteristic polynomial).
- If an
matrix has
distinct eigenvalues, it is diagonalizable.
- Symmetric matrices are a special case where
can be chosen to be orthogonal (
), so
.
7. Positive Definite and Semi-definite Matrices
These definitions apply to symmetric matrices
. The quadratic form associated with
is
.
- **Positive Definite (
):**
- Equivalent conditions:
- All eigenvalues of
are strictly positive (
for all
). 2. All leading principal minors of
are strictly positive. (A leading principal minor of order
is the determinant of the top-left
submatrix). 3. Cholesky decomposition
exists where
is a lower triangular matrix with strictly positive diagonal entries.
- **Positive Semi-definite (
):**
- Equivalent conditions:
- All eigenvalues of
are non-negative (
for all
). 2. All principal minors of
are non-negative. (A principal minor is the determinant of a submatrix obtained by deleting the same set of rows and columns).
- **Negative Definite (
):**
for all non-zero
. (Equivalent to
, all eigenvalues
).
- **Negative Semi-definite (
):**
for all
. (Equivalent to
, all eigenvalues
).
- Indefinite: The matrix
is neither positive semi-definite nor negative semi-definite. This means the quadratic form
can take both positive and negative values. (Equivalent to
having at least one positive eigenvalue and at least one negative eigenvalue).
- Connection to Convexity: For a twice-differentiable function
, its Hessian matrix
being positive semi-definite (positive definite) over a convex domain implies
is convex (strictly convex) over that domain. This is fundamental in optimization.
8. Inner Product Spaces
Definition. Inner Product
An inner product on a real vector space
\[ V \]is a function
\[ \langle \cdot, \cdot \rangle : V \times V \to \mathbb{R} \]that for all vectors
\[ \mathbf{u}, \mathbf{v}, \mathbf{w} \in V \]and any scalar
\[ c \in \mathbb{R} \]satisfies the following properties:
- Symmetry:
\[ \langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle \]
Linearity in the first argument:
\[ \langle c\mathbf{u} + \mathbf{v}, \mathbf{w} \rangle = c\langle \mathbf{u}, \mathbf{w} \rangle + \langle \mathbf{v}, \mathbf{w} \rangle \](Linearity in the second argument follows from symmetry:
\[ \langle \mathbf{u}, c\mathbf{v} + \mathbf{w} \rangle = c\langle \mathbf{u}, \mathbf{v} \rangle + \langle \mathbf{u}, \mathbf{w} \rangle \]).
- Positive-definiteness:
\[ \langle \mathbf{v}, \mathbf{v} \rangle \ge 0 \], and
\[ \langle \mathbf{v}, \mathbf{v} \rangle = 0 \iff \mathbf{v} = \mathbf{0} \]The standard dot product in
\[ \mathbb{R}^n \](
\[ \langle \mathbf{u}, \mathbf{v} \rangle = \mathbf{u}^T \mathbf{v} \]) is an example of an inner product. Another example is a weighted inner product
\[ \langle \mathbf{u}, \mathbf{v} \rangle_W = \mathbf{u}^T W \mathbf{v} \]where
\[ W \]is a symmetric positive definite matrix. An inner product induces a norm defined by
\[ \Vert \mathbf{v} \Vert = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle} \].
-
Cauchy-Schwarz Inequality (general form):
\[ \vert \langle \mathbf{u}, \mathbf{v} \rangle \vert \le \Vert \mathbf{u} \Vert \Vert \mathbf{v} \Vert \]Equality holds if and only if
and
are linearly dependent.
- Orthogonality: Two vectors
and
are orthogonal with respect to the inner product if
.
- Orthonormal Basis: A basis
for an
-dimensional inner product space is orthonormal if
(Kronecker delta: 1 if
, 0 if
).
- Gram-Schmidt Process: An algorithm that takes any basis for an inner product space and constructs an orthonormal basis for that space.
- Parseval’s Identity: If
is an orthonormal basis for an inner product space
, then for any vector
:
9. Matrix Calculus
Common derivatives using numerator layout convention. Result dimensions match the variable being differentiated with respect to (e.g., if differentiating w.r.t a column vector
, result is a column vector).
| Function
1
| Derivative | Notes | | ------------------------------------------------------------ | --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | |
(or
) |
1
|
. Result is
. | |
1
|
1
|
. If
symmetric,
. Result is
. | |
1
|
|
. Result is
. | |
1
|
1
|
. Result is
(shape of
). | |
1
|
1
|
. Result is
(shape of
). | |
1
|
1
|
. Result is
(shape of
). | |
1
|
1
|
. If
symmetric,
. Result is
(shape of
). | |
1
|
1
|
positive definite. If
symmetric,
. Result is
. | |
(Jacobi’s Formula) |
1
|
invertible. If
symmetric,
. Result is
. |
10. Miscellaneous Identities and Facts
- Woodbury Matrix Identity (Matrix Inversion Lemma): Allows efficient computation of the inverse of a rank-
corrected matrix:
where
,
,
,
. Assumes
and
are invertible.
- Sherman-Morrison Formula (rank-1 update): A special case of the Woodbury identity where
(i.e.,
,
,
):
Assumes
is invertible and
.
- Projection Matrix:
- The matrix that orthogonally projects vectors onto the column space
of a matrix
with linearly independent columns (i.e.,
) is:
1
2
<div class="math-block" markdown="0"> \[ P_A = A(A^T A)^{-1} A^T \]
</div>
- For any
,
is the vector in
closest to
.
- Properties of orthogonal projection matrices:
(idempotent) and
(symmetric).
- **Relationship between
and
for
:**
- Both
and
are symmetric and positive semi-definite.
- They have the same non-zero eigenvalues. Their non-zero eigenvalues are the squares of the non-zero singular values of
.
.
- If
has full column rank (
), then
is positive definite (and thus invertible).
- If
has full row rank (
), then
is positive definite (and thus invertible). — This list is intended as a compact reference. For deeper understanding, proofs, and further topics, consult standard linear algebra textbooks or Wikipedia.