Post

Cheat Sheet: Linear Algebra

A quick reference guide compiling essential theorems, identities, facts, and formulas from linear algebra, designed for the crash course on mathematical foundations for machine learning and optimization.

Cheat Sheet: Linear Algebra

Linear Algebra: Key Concepts and Formulas

This quick reference guide covers essential theorems, identities, and facts from linear algebra, particularly relevant for machine learning and optimization.

1. Vectors

Definitions & Operations

  • Vector: An element of a vector space, often represented as an array of numbers (components). Geometrically, a quantity with magnitude and direction.

    \[ \mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \in \mathbb{R}^n \]
  • Vector Addition:

    \[ \mathbf{u} + \mathbf{v} = \begin{bmatrix} u_1+v_1 \\ \vdots \\ u_n+v_n \end{bmatrix} \]
    • Properties: Commutative (
\[ \mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u} \]

), Associative (

\[ (\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w}) \]

)

  • Scalar Multiplication:

    \[ c\mathbf{v} = \begin{bmatrix} cv_1 \\ \vdots \\ cv_n \end{bmatrix} \]
    • Properties: Distributive over vector/scalar addition, Associative
  • **Dot Product (Standard Inner Product for

\[ \mathbb{R}^n \]

):**

\[ \mathbf{u} \cdot \mathbf{v} = \mathbf{u}^T \mathbf{v} = \sum_{i=1}^n u_i v_i \]
  • Properties:
    1. Commutative:
\[ \mathbf{u} \cdot \mathbf{v} = \mathbf{v} \cdot \mathbf{u} \]
1
2. Distributive: 
\[ \mathbf{u} \cdot (\mathbf{v} + \mathbf{w}) = \mathbf{u} \cdot \mathbf{v} + \mathbf{u} \cdot \mathbf{w} \]
1
3. Bilinear: 
\[ (c\mathbf{u}) \cdot \mathbf{v} = c(\mathbf{u} \cdot \mathbf{v}) = \mathbf{u} \cdot (c\mathbf{v}) \]
1
4. Positive-definiteness: 
\[ \mathbf{v} \cdot \mathbf{v} = \Vert \mathbf{v} \Vert_2^2 \ge 0 \]

, and

\[ \mathbf{v} \cdot \mathbf{v} = 0 \iff \mathbf{v} = \mathbf{0} \]
  • Angle between two non-zero vectors:

    \[ \cos \theta = \frac{\mathbf{u} \cdot \mathbf{v}}{\Vert \mathbf{u} \Vert_2 \Vert \mathbf{v} \Vert_2} \]
  • Orthogonal Vectors:

\[ \mathbf{u} \cdot \mathbf{v} = 0 \]

(implies

\[ \theta = \pi/2 \]

or

\[ 90^\circ \]

if vectors are non-zero).

  • Vector Projection: The projection of vector
\[ \mathbf{a} \]

onto non-zero vector

\[ \mathbf{b} \]

is:

\[ \text{proj}_{\mathbf{b}} \mathbf{a} = \frac{\mathbf{a} \cdot \mathbf{b}}{\Vert \mathbf{b} \Vert_2^2} \mathbf{b} \]

The scalar component of

\[ \mathbf{a} \]

along

\[ \mathbf{b} \]

is

\[ \frac{\mathbf{a} \cdot \mathbf{b}}{\Vert \mathbf{b} \Vert_2} \]

.

Vector Norms

  • Definition: A function
\[ \Vert \cdot \Vert : V \to \mathbb{R} \]

such that for all

\[ c \in \mathbb{R} \]

,

\[ \mathbf{u}, \mathbf{v} \in V \]

:

\[ \Vert \mathbf{v} \Vert \ge 0 \]

(Non-negativity)

\[ \Vert \mathbf{v} \Vert = 0 \iff \mathbf{v} = \mathbf{0} \]

(Definiteness)

\[ \Vert c\mathbf{v} \Vert = \vert c \vert \Vert \mathbf{v} \Vert \]

(Absolute homogeneity)

\[ \Vert \mathbf{u} + \mathbf{v} \Vert \le \Vert \mathbf{u} \Vert + \Vert \mathbf{v} \Vert \]

(Triangle Inequality)

  • **Common
\[ L_p \]

Norms:** For

\[ \mathbf{v} \in \mathbb{R}^n \]
\[ L_1 \]

norm (Manhattan norm):

\[ \Vert \mathbf{v} \Vert_1 = \sum_{i=1}^n \vert v_i \vert \]
\[ L_2 \]

norm (Euclidean norm):

\[ \Vert \mathbf{v} \Vert_2 = \sqrt{\sum_{i=1}^n v_i^2} = \sqrt{\mathbf{v}^T \mathbf{v}} \]
\[ L_p \]

norm (

\[ p \ge 1 \]

):

\[ \Vert \mathbf{v} \Vert_p = \left( \sum_{i=1}^n \vert v_i \vert^p \right)^{1/p} \]
\[ L_\infty \]

norm (Max norm):

\[ \Vert \mathbf{v} \Vert_\infty = \max_{1 \le i \le n} \vert v_i \vert \]
  • Cauchy-Schwarz Inequality:

    \[ \vert \mathbf{u} \cdot \mathbf{v} \vert \le \Vert \mathbf{u} \Vert_2 \Vert \mathbf{v} \Vert_2 \]

    More generally for inner products:

\[ \vert \langle \mathbf{u}, \mathbf{v} \rangle \vert \le \Vert \mathbf{u} \Vert \Vert \mathbf{v} \Vert \]

. Equality holds if and only if

\[ \mathbf{u} \]

and

\[ \mathbf{v} \]

are linearly dependent.

2. Matrices

Definitions & Special Matrices

  • Matrix: A rectangular array of numbers,
\[ A \in \mathbb{R}^{m \times n} \]

.

| Type | Definition / Condition | Key Notes | | ————————— | ——————————————————————– | —————————————————————————————————————– | | Identity (

\[ I_n \]

or

\[ I \]

) | Square matrix;

\[ A_{ii}=1 \]

,

\[ A_{ij}=0 \]

for

\[ i \ne j \]
1
        | 
\[ AI=IA=A \]
1
                                                                                                   | | Zero (
\[ 0 \]

) | All entries

\[ A_{ij}=0 \]
1
                                         |                                                                                                                   | | Diagonal                    | 
\[ A_{ij}=0 \]

for

\[ i \ne j \]
1
                                     |                                                                                                                   | | Symmetric                   | 
\[ A = A^T \]

(square matrix) | Real eigenvalues, orthogonally diagonalizable | | Skew-Symmetric |

\[ A = -A^T \]

(square matrix) | Diagonal elements

\[ A_{ii}=0 \]

. Eigenvalues are 0 or purely imaginary. | | Upper Triangular |

\[ A_{ij}=0 \]

for

\[ i > j \]

(square matrix) | Eigenvalues are the diagonal entries | | Lower Triangular |

\[ A_{ij}=0 \]

for

\[ i < j \]

(square matrix) | Eigenvalues are the diagonal entries | | Orthogonal (

\[ Q \]

) | Square matrix;

\[ Q^T Q = Q Q^T = I \]
1
                             | 
\[ Q^{-1}=Q^T \]

, columns/rows form an orthonormal basis,

\[ \det(Q)=\pm 1 \]

, preserves dot products &

\[ L_2 \]

norms | | Normal | Square matrix;

\[ AA^T = A^T A \]

(real) or

\[ AA^H = A^H A \]

(complex) | Unitarily diagonalizable. Includes symmetric, skew-symmetric, orthogonal matrices. |

Matrix Operations

  • Matrix Addition/Subtraction: Element-wise. Matrices must have the same dimensions.
  • Scalar Multiplication: Multiply each element by the scalar.
  • Matrix Multiplication: If
\[ A \in \mathbb{R}^{m \times n} \]

and

\[ B \in \mathbb{R}^{n \times p} \]

, then

\[ C = AB \in \mathbb{R}^{m \times p} \]

, where

\[ C_{ij} = \sum_{k=1}^n A_{ik} B_{kj} \]
  • Properties:
    1. Associative:
\[ (AB)C = A(BC) \]
1
2. Distributive: 
\[ A(B+C) = AB + AC \]

and

\[ (B+C)D = BD + CD \]
1
3. **Not Commutative** in general: 
\[ AB \ne BA \]
  • **Transpose (
\[ A^T \]

):** Rows become columns (and vice versa).

\[ (A^T)_{ij} = A_{ji} \]

.

  • Properties:
\[ (A^T)^T = A \]
1
2. 
\[ (A+B)^T = A^T + B^T \]
1
3. 
\[ (cA)^T = cA^T \]
1
4. 
\[ (AB)^T = B^T A^T \]

(Reverse order)

  • **Matrix Inverse (
\[ A^{-1} \]

):** For a square matrix

\[ A \]

, if there exists

\[ A^{-1} \]

such that

\[ AA^{-1} = A^{-1}A = I \]

.

\[ A \]

is invertible (or non-singular) if

\[ A^{-1} \]

exists. This is true if and only if

\[ det(A) \ne 0 \]

.

  • Properties:
\[ (A^{-1})^{-1} = A \]
1
2. 
\[ (AB)^{-1} = B^{-1}A^{-1} \]

(if A, B are invertible, reverse order) 3.

\[ (A^T)^{-1} = (A^{-1})^T \]
  • **Trace (
\[ \text{tr}(A) \]

)**: Sum of diagonal elements of a square matrix

\[ A \]

.

\[ \text{tr}(A) = \sum_{i=1}^n A_{ii} \]
  • Properties:
\[ \text{tr}(A+B) = \text{tr}(A) + \text{tr}(B) \]
1
2. 
\[ \text{tr}(cA) = c \cdot \text{tr}(A) \]
1
3. 
\[ \text{tr}(AB) = \text{tr}(BA) \]

(Cyclic property. Valid if

\[ AB \]

and

\[ BA \]

are square, e.g.,

\[ A \in \mathbb{R}^{m \times n}, B \in \mathbb{R}^{n \times m} \]

). 4.

\[ \text{tr}(A) = \text{tr}(A^T) \]
1
5. 
\[ \text{tr}(ABC) = \text{tr}(BCA) = \text{tr}(CAB) \]
  • **Hadamard Product (Element-wise product,
\[ A \odot B \]

):**

\[ (A \odot B)_{ij} = A_{ij} B_{ij} \]

. Matrices must have the same dimensions.

  • **Moore-Penrose Pseudoinverse (
\[ A^+ \]

):** For any matrix

\[ A \in \mathbb{R}^{m \times n} \]

, the pseudoinverse

\[ A^+ \in \mathbb{R}^{n \times m} \]

is the unique matrix satisfying:

\[ AA^+A = A \]
\[ A^+AA^+ = A^+ \]
\[ (AA^+)^T = AA^+ \]

(so

\[ AA^+ \]

is symmetric)

\[ (A^+A)^T = A^+A \]

(so

\[ A^+A \]

is symmetric)

  • If
\[ A \]

has full column rank (

\[ n \le m \]

,

\[ \text{rank}(A)=n \]

), then

\[ A^+ = (A^T A)^{-1}A^T \]

. In this case,

\[ A^+A = I_n \]

.

  • If
\[ A \]

has full row rank (

\[ m \le n \]

,

\[ \text{rank}(A)=m \]

), then

\[ A^+ = A^T(AA^T)^{-1} \]

. In this case,

\[ AA^+ = I_m \]

.

  • Computation via SVD: If
\[ A = U\Sigma V^T \]

is the SVD of

\[ A \]

, then

\[ A^+ = V\Sigma^+U^T \]

, where

\[ \Sigma^+ \]

is formed by taking the reciprocal of the non-zero singular values in

\[ \Sigma \]

and transposing the resulting matrix.

  • Use: The vector
\[ \mathbf{x}^+ = A^+\mathbf{b} \]

is the minimum norm (

\[ \Vert \mathbf{x} \Vert_2 \]

) least-squares solution to

\[ A\mathbf{x} = \mathbf{b} \]

.

Matrix Norms

  • Operator Norm (Induced Norm):

    \[ \Vert A \Vert_p = \sup_{\mathbf{x} \ne \mathbf{0}} \frac{\Vert A\mathbf{x} \Vert_p}{\Vert \mathbf{x} \Vert_p} = \sup_{\Vert \mathbf{x} \Vert_p = 1} \Vert A\mathbf{x} \Vert_p \]
    • Spectral Norm (
\[ p=2 \]

):

\[ \Vert A \Vert_2 = \sigma_{\max}(A) \]

(largest singular value of

\[ A \]

)

1
2
3
4
5
<div class="math-block" markdown="0"> \[ \Vert A \Vert_2 = \sqrt{\lambda_{\max}(A^T A)} \]
</div>


where 
\[ \lambda_{\max}(M) \]

is the largest eigenvalue of

\[ M \]

.

  • General property:
\[ \Vert AB \Vert_p \le \Vert A \Vert_p \Vert B \Vert_p \]

(submultiplicativity for induced norms).

  • Frobenius Norm:

    \[ \Vert A \Vert_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n \vert A_{ij} \vert^2} = \sqrt{\text{tr}(A^T A)} \]
    • Properties:
\[ \Vert A \Vert_F^2 = \sum_{i=1}^{\min(m,n)} \sigma_i^2(A) \]

(sum of squares of singular values of

\[ A \]

) 2. Submultiplicative:

\[ \Vert AB \Vert_F \le \Vert A \Vert_F \Vert B \Vert_F \]
1
3. 
\[ \Vert A \Vert_2 \le \Vert A \Vert_F \le \sqrt{\text{rank}(A)} \Vert A \Vert_2 \]

3. Linear Systems, Vector Spaces & Subspaces

Linear Transformations

Definition. Linear Transformation

A transformation (or mapping)

\[ T: V \to W \]

from a vector space

\[ V \]

to a vector space

\[ W \]

is linear if for all vectors

\[ \mathbf{u}, \mathbf{v} \in V \]

and all scalars

\[ c \]

:

\[ T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}) \]

(Additivity)

\[ T(c\mathbf{u}) = cT(\mathbf{u}) \]

(Homogeneity)

Any linear transformation

\[ T: \mathbb{R}^n \to \mathbb{R}^m \]

can be represented by matrix multiplication

\[ T(\mathbf{x}) = A\mathbf{x} \]

for a unique matrix

\[ A \in \mathbb{R}^{m \times n} \]

. The columns of

\[ A \]

are the images of the standard basis vectors of

\[ \mathbb{R}^n \]

under

\[ T \]

.

Systems of Linear Equations

  • A system of linear equations can be written as
\[ A\mathbf{x} = \mathbf{b} \]

, where

\[ A \in \mathbb{R}^{m \times n} \]

is the coefficient matrix,

\[ \mathbf{x} \in \mathbb{R}^n \]

is the vector of unknowns, and

\[ \mathbf{b} \in \mathbb{R}^m \]

is the constant vector.

  • A solution exists if and only if
\[ \mathbf{b} \in \text{Col}(A) \]

(the column space of

\[ A \]

). Equivalently,

\[ \text{rank}(A) = \text{rank}([A \mid \mathbf{b}]) \]

.

  • If a solution exists:
    • It is unique if and only if
\[ \text{Null}(A) = \{\mathbf{0}\} \]

(i.e., columns of

\[ A \]

are linearly independent). This implies

\[ n \le m \]

and

\[ \text{rank}(A) = n \]

.

  • If
\[ A \]

is square (

\[ m=n \]

) and invertible (

\[ \det(A) \ne 0 \]

), there is a unique solution

\[ \mathbf{x} = A^{-1}\mathbf{b} \]

.

  • If there are free variables (i.e.,
\[ \text{nullity}(A) > 0 \]

), there are infinitely many solutions. The general solution is

\[ \mathbf{x}_p + \mathbf{x}_h \]

, where

\[ \mathbf{x}_p \]

is a particular solution and

\[ \mathbf{x}_h \]

is any solution to the homogeneous equation

\[ A\mathbf{x}=\mathbf{0} \]

.

Vector Space Axioms

Definition. Vector Space

A set

\[ V \]

equipped with two operations, vector addition (

\[ + \]

) and scalar multiplication (

\[ \cdot \]

), is a vector space over a field

\[ \mathbb{F} \]

(typically

\[ \mathbb{R} \]

or

\[ \mathbb{C} \]

) if it satisfies the following axioms for all vectors

\[ \mathbf{u}, \mathbf{v}, \mathbf{w} \in V \]

and scalars

\[ c, d \in \mathbb{F} \]

:

\[ \mathbf{u} + \mathbf{v} \in V \]

(Closure under addition)

\[ \mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u} \]

(Commutativity of addition)

\[ (\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w}) \]

(Associativity of addition)

  1. There exists a zero vector
\[ \mathbf{0} \in V \]

such that

\[ \mathbf{v} + \mathbf{0} = \mathbf{v} \]

for all

\[ \mathbf{v} \in V \]

(Additive identity)

  1. For every
\[ \mathbf{v} \in V \]

, there exists an additive inverse

\[ -\mathbf{v} \in V \]

such that

\[ \mathbf{v} + (-\mathbf{v}) = \mathbf{0} \]
\[ c\mathbf{v} \in V \]

(Closure under scalar multiplication)

\[ c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v} \]

(Distributivity of scalar multiplication over vector addition)

\[ (c+d)\mathbf{v} = c\mathbf{v} + d\mathbf{v} \]

(Distributivity of scalar multiplication over field addition)

\[ c(d\mathbf{v}) = (cd)\mathbf{v} \]

(Associativity of scalar multiplication)

\[ 1\mathbf{v} = \mathbf{v} \]

(Scalar multiplicative identity, where

\[ 1 \]

is the multiplicative identity in

\[ \mathbb{F} \]

)

Key Concepts for Vector Spaces

  • Linear Combination: A vector
\[ \mathbf{w} = c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \dots + c_k\mathbf{v}_k \]

where

\[ c_i \]

are scalars.

  • Span: The set of all possible linear combinations of a set of vectors
\[ \{\mathbf{v}_1, \dots, \mathbf{v}_k\} \]

. Denoted

\[ \text{span}\{\mathbf{v}_1, \dots, \mathbf{v}_k\} \]

.

  • Linear Independence: A set of vectors
\[ \{\mathbf{v}_1, \dots, \mathbf{v}_k\} \]

is linearly independent if the only solution to

\[ c_1\mathbf{v}_1 + \dots + c_k\mathbf{v}_k = \mathbf{0} \]

is

\[ c_1 = c_2 = \dots = c_k = 0 \]

.

  • Basis: A linearly independent set of vectors that spans the entire vector space. All bases for a given vector space have the same number of vectors.
  • Dimension: The number of vectors in any basis for a vector space. Denoted
\[ \dim(V) \]

.

  • Subspace: A subset of a vector space that is itself a vector space under the same operations (i.e., it is closed under vector addition and scalar multiplication, and contains the zero vector).

Fundamental Subspaces of a Matrix

\[ A \in \mathbb{R}^{m \times n} \]

| Subspace Name | Definition | Space | Dimension | Orthogonal Complement | | ————————————– | —————————————————————————————- | —————- | —————————————— | ————————————– | | Column Space (Image, Range) |

\[ \text{Col}(A) = \{ A\mathbf{x} \mid \mathbf{x} \in \mathbb{R}^n \} \]
1
               | 
\[ \mathbb{R}^m \]

|

\[ \text{rank}(A) \]
1
                     | Left Null Space (
\[ \text{Null}(A^T) \]

) | | Null Space (Kernel) |

\[ \text{Null}(A) = \{ \mathbf{x} \in \mathbb{R}^n \mid A\mathbf{x} = \mathbf{0} \} \]
1
 | 
\[ \mathbb{R}^n \]

|

\[ \text{nullity}(A) = n - \text{rank}(A) \]

| Row Space (

\[ \text{Row}(A) \]

) | | Row Space (

\[ \text{Col}(A^T) \]

) |

\[ \text{Row}(A) = \{ A^T\mathbf{y} \mid \mathbf{y} \in \mathbb{R}^m \} \]
1
             | 
\[ \mathbb{R}^n \]

|

\[ \text{rank}(A) \]
1
                     | Null Space (
\[ \text{Null}(A) \]

) | | Left Null Space (

\[ \text{Null}(A^T) \]

) |

\[ \text{Null}(A^T) = \{ \mathbf{y} \in \mathbb{R}^m \mid A^T\mathbf{y} = \mathbf{0} \} \]

|

\[ \mathbb{R}^m \]

|

\[ m - \text{rank}(A) \]
1
                 | Column Space (
\[ \text{Col}(A) \]

) |

  • Properties of Rank: For matrices
\[ A \]

and

\[ B \]

(dimensions allowing products/sums):

\[ 0 \le \text{rank}(A) \le \min(m, n) \]
\[ \text{rank}(AB) \le \min(\text{rank}(A), \text{rank}(B)) \]
\[ \text{rank}(A+B) \le \text{rank}(A) + \text{rank}(B) \]
\[ \text{rank}(A) = \text{rank}(A^T) = \text{rank}(A^T A) = \text{rank}(AA^T) \]

Theorem. Rank-Nullity Theorem

For any matrix

\[ A \in \mathbb{R}^{m \times n} \]

:

\[ \text{rank}(A) + \text{nullity}(A) = n \quad (\text{number of columns of } A) \]

Orthogonal Complements

  • Two subspaces
\[ V \]

and

\[ W \]

of

\[ \mathbb{R}^k \]

are orthogonal complements if every vector in

\[ V \]

is orthogonal to every vector in

\[ W \]

, and

\[ V \oplus W = \mathbb{R}^k \]

(their direct sum spans

\[ \mathbb{R}^k \]

and their intersection is

\[ \{\mathbf{0}\} \]

). Denoted

\[ W = V^\perp \]

.

  • The table above shows the orthogonal complement relationships for the fundamental subspaces. For example,
\[ \text{Col}(A)^\perp = \text{Null}(A^T) \]

in

\[ \mathbb{R}^m \]

, and

\[ \text{Row}(A)^\perp = \text{Null}(A) \]

in

\[ \mathbb{R}^n \]

. This implies

\[ \mathbb{R}^m = \text{Col}(A) \oplus \text{Null}(A^T) \]

and

\[ \mathbb{R}^n = \text{Row}(A) \oplus \text{Null}(A) \]

.

4. Determinants

  • Definition: A scalar value associated with a square matrix
\[ A \in \mathbb{R}^{n \times n} \]

, denoted

\[ \det(A) \]

or

\[ \vert A \vert \]

.

  • For
\[ n=1 \]

,

\[ A = [a_{11}] \]

,

\[ \det(A) = a_{11} \]

.

  • For
\[ n=2 \]

,

\[ A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \]

,

\[ \det(A) = ad - bc \]

.

  • For
\[ n > 2 \]

, often defined recursively using cofactor expansion along any row

\[ i \]

(or column

\[ j \]

):

1
2
3
4
5
<div class="math-block" markdown="0"> \[ \det(A) = \sum_{j=1}^n (-1)^{i+j} A_{ij} M_{ij} \quad (\text{expansion along row } i) \]
</div>


where 
\[ M_{ij} \]

is the determinant of the submatrix obtained by deleting row

\[ i \]

and column

\[ j \]

(the

\[ (i,j) \]

-minor).

  • Properties:
\[ \det(I) = 1 \]

.

  1. If
\[ A \]

has a row or column of zeros,

\[ \det(A) = 0 \]

.

  1. If two rows or two columns of
\[ A \]

are identical,

\[ \det(A) = 0 \]

.

  1. If
\[ B \]

is obtained from

\[ A \]

by swapping two rows (or two columns), then

\[ \det(B) = -\det(A) \]

.

  1. If
\[ B \]

is obtained from

\[ A \]

by multiplying a single row (or column) by a scalar

\[ c \]

, then

\[ \det(B) = c \cdot \det(A) \]

.

  1. Consequently, for
\[ A \in \mathbb{R}^{n \times n} \]

,

\[ \det(cA) = c^n \det(A) \]

.

  1. If
\[ B \]

is obtained from

\[ A \]

by adding a multiple of one row to another row (or column to column), then

\[ \det(B) = \det(A) \]

.

\[ \det(AB) = \det(A)\det(B) \]

for square matrices

\[ A, B \]

.

\[ \det(A^T) = \det(A) \]

.

\[ A \]

is invertible (non-singular) if and only if

\[ \det(A) \ne 0 \]

.

  1. If
\[ A \]

is invertible,

\[ \det(A^{-1}) = 1/\det(A) = (\det(A))^{-1} \]

.

  1. For a triangular matrix (upper or lower),
\[ \det(A) \]

is the product of its diagonal entries.

  • Geometric Interpretation: For a matrix
\[ A \in \mathbb{R}^{n \times n} \]

,

\[ \vert \det(A) \vert \]

is the factor by which the linear transformation represented by

\[ A \]

scales

\[ n \]

-dimensional volume. The sign of

\[ \det(A) \]

indicates whether the transformation preserves or reverses orientation.

Adjoint Matrix and Cramer’s Rule

  • Adjoint (or Adjugate) Matrix: The adjoint of
\[ A \]

, denoted

\[ \text{adj}(A) \]

or

\[ \text{Adj}(A) \]

, is the transpose of the cofactor matrix of

\[ A \]

. That is,

\[ (\text{adj}(A))_{ij} = C_{ji} = (-1)^{j+i} M_{ji} \]

.

  • Property:
\[ A \cdot \text{adj}(A) = \text{adj}(A) \cdot A = \det(A)I \]

.

  • If
\[ \det(A) \ne 0 \]

, then

\[ A^{-1} = \frac{1}{\det(A)} \text{adj}(A) \]

.

  • Cramer’s Rule: For an invertible matrix
\[ A \]

, the unique solution to

\[ A\mathbf{x} = \mathbf{b} \]

is given by:

\[ x_i = \frac{\det(A_i)}{\det(A)} \]

where

\[ A_i \]

is the matrix formed by replacing the

\[ i \]

-th column of

\[ A \]

with the vector

\[ \mathbf{b} \]

. Cramer’s rule is computationally inefficient for large systems but is theoretically important.

5. Eigenvalues and Eigenvectors

  • Definition: For a square matrix
\[ A \in \mathbb{R}^{n \times n} \]

, a non-zero vector

\[ \mathbf{v} \in \mathbb{C}^n \]

is an eigenvector of

\[ A \]

if

\[ A\mathbf{v} = \lambda\mathbf{v} \]

for some scalar

\[ \lambda \in \mathbb{C} \]

. The scalar

\[ \lambda \]

is the corresponding eigenvalue.

  • Characteristic Equation: Eigenvalues are the roots of the characteristic polynomial
\[ p(\lambda) = \det(A - \lambda I) = 0 \]

. This is a polynomial in

\[ \lambda \]

of degree

\[ n \]

.

  • Properties:
    1. An
\[ n \times n \]

matrix

\[ A \]

has

\[ n \]

eigenvalues in

\[ \mathbb{C} \]

, counting multiplicities.

  1. Sum of eigenvalues:
\[ \sum_{i=1}^n \lambda_i = \text{tr}(A) \]
  1. Product of eigenvalues:
\[ \prod_{i=1}^n \lambda_i = \det(A) \]
  1. Eigenvectors corresponding to distinct eigenvalues are linearly independent.
  2. If
\[ A \]

is a triangular matrix, its eigenvalues are its diagonal entries.

\[ A \]

and

\[ A^T \]

have the same eigenvalues (but generally different eigenvectors).

  1. If
\[ \lambda \]

is an eigenvalue of an invertible matrix

\[ A \]

, then

\[ 1/\lambda \]

is an eigenvalue of

\[ A^{-1} \]

. The corresponding eigenvector is the same.

  1. If
\[ \lambda \]

is an eigenvalue of

\[ A \]

, then

\[ \lambda^k \]

is an eigenvalue of

\[ A^k \]

for any integer

\[ k \ge 0 \]

. The corresponding eigenvector is the same.

  1. The set of all eigenvectors corresponding to an eigenvalue
\[ \lambda \]

, along with the zero vector, forms a subspace called the eigenspace

\[ E_\lambda = \text{Null}(A - \lambda I) \]

.

  1. Cayley-Hamilton Theorem: Every square matrix satisfies its own characteristic equation. If
\[ p(\lambda) = c_n \lambda^n + \dots + c_1 \lambda + c_0 \]

is the characteristic polynomial of

\[ A \]

, then

\[ p(A) = c_n A^n + \dots + c_1 A + c_0 I = 0 \]

.

For Symmetric Matrices (

\[ A = A^T \]

,

\[ A \in \mathbb{R}^{n \times n} \]

)

  1. All eigenvalues are real numbers.
  2. Eigenvectors corresponding to distinct eigenvalues are orthogonal.
  3. A real symmetric matrix is always orthogonally diagonalizable: there exists an orthogonal matrix
\[ Q \]

whose columns are orthonormal eigenvectors of

\[ A \]

, and a diagonal matrix

\[ \Lambda \]

whose diagonal entries are the corresponding eigenvalues, such that

\[ A = Q\Lambda Q^T \]

. This is known as the Spectral Theorem.

6. Matrix Decompositions

These are ways to factorize a matrix into a product of other matrices with useful properties.

Theorem. Spectral Theorem for Real Symmetric Matrices

If

\[ A \in \mathbb{R}^{n \times n} \]

is a symmetric matrix (i.e.,

\[ A = A^T \]

), then there exists an orthogonal matrix

\[ Q \in \mathbb{R}^{n \times n} \]

(whose columns are orthonormal eigenvectors of

\[ A \]

) and a real diagonal matrix

\[ \Lambda \in \mathbb{R}^{n \times n} \]

(whose diagonal entries are the eigenvalues of

\[ A \]

corresponding to the columns of

\[ Q \]

) such that:

\[ A = Q \Lambda Q^T \]

This decomposition allows

\[ A \]

to be expressed as a sum of rank-1 outer products:

\[ A = \sum_{i=1}^n \lambda_i \mathbf{q}_i \mathbf{q}_i^T \]

where

\[ \mathbf{q}_i \]

are the orthonormal eigenvectors (columns of

\[ Q \]

) and

\[ \lambda_i \]

are the corresponding eigenvalues (diagonal entries of

\[ \Lambda \]

).

Theorem. Singular Value Decomposition (SVD)

For any matrix

\[ A \in \mathbb{R}^{m \times n} \]

(not necessarily square or symmetric), there exist orthogonal matrices

\[ U \in \mathbb{R}^{m \times m} \]

and

\[ V \in \mathbb{R}^{n \times n} \]

, and a real diagonal matrix

\[ \Sigma \in \mathbb{R}^{m \times n} \]

(meaning only

\[ (\Sigma)_{ii} \]

can be non-zero) with non-negative real numbers

\[ \sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_r > 0 \]

on its “diagonal” (where

\[ r = \text{rank}(A) \]

, and

\[ \sigma_i = 0 \]

for

\[ i > r \]

), such that:

\[ A = U \Sigma V^T \]
  • The columns of
\[ U \]

are left singular vectors (orthonormal eigenvectors of

\[ AA^T \]

).

  • The columns of
\[ V \]

are right singular vectors (orthonormal eigenvectors of

\[ A^T A \]

).

  • The diagonal entries of
\[ \Sigma \]

(

\[ \sigma_i \]

) are singular values of

\[ A \]

. They are the square roots of the non-zero eigenvalues of both

\[ A^T A \]

and

\[ AA^T \]

:

\[ \sigma_i(A) = \sqrt{\lambda_i(A^T A)} = \sqrt{\lambda_i(AA^T)} \]

. SVD provides a way to write

\[ A \]

as a sum of rank-1 matrices (Full SVD sum goes up to

\[ \min(m,n) \]

, but terms with

\[ \sigma_i=0 \]

vanish):

\[ A = \sum_{i=1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i^T \]

where

\[ \mathbf{u}_i \]

is the

\[ i \]

-th column of

\[ U \]

and

\[ \mathbf{v}_i \]

is the

\[ i \]

-th column of

\[ V \]

.

Other Useful Decompositions

  • LU Decomposition: For many square matrices
\[ A \]

, one can find a factorization

\[ A = LU \]

(or

\[ PA=LU \]

including permutations

\[ P \]

for stability/existence), where

\[ L \]

is a lower triangular matrix (often with 1s on its diagonal, making it unit lower triangular) and

\[ U \]

is an upper triangular matrix. This is commonly used to solve systems

\[ A\mathbf{x}=\mathbf{b} \]

efficiently via forward and backward substitution.

  • QR Decomposition: For any
\[ A \in \mathbb{R}^{m \times n} \]

, it can be factored as

\[ A = QR \]

, where

\[ Q \in \mathbb{R}^{m \times m} \]

is an orthogonal matrix and

\[ R \in \mathbb{R}^{m \times n} \]

is an upper triangular matrix.

  • If
\[ m \ge n \]

(tall or square matrix), a “thin” or “reduced” QR decomposition is often used:

\[ A = Q_1 R_1 \]

, where

\[ Q_1 \in \mathbb{R}^{m \times n} \]

has orthonormal columns and

\[ R_1 \in \mathbb{R}^{n \times n} \]

is upper triangular. The columns of

\[ Q_1 \]

form an orthonormal basis for

\[ \text{Col}(A) \]

if

\[ A \]

has full column rank. This decomposition can be found using Gram-Schmidt process on columns of

\[ A \]

.

  • Cholesky Decomposition: For a real, symmetric, positive definite matrix
\[ A \]

, there exists a unique lower triangular matrix

\[ L \]

with strictly positive diagonal entries such that

\[ A = LL^T \]

. (Alternatively,

\[ A=R^T R \]

where

\[ R \]

is upper triangular,

\[ R=L^T \]

). This is often used for solving linear systems with symmetric positive definite coefficient matrices and in statistics (e.g., sampling from multivariate normal distributions).

  • **Diagonalization (
\[ A = PDP^{-1} \]

):** An

\[ n \times n \]

matrix

\[ A \]

is diagonalizable if and only if it has

\[ n \]

linearly independent eigenvectors. In this case,

\[ A = PDP^{-1} \]

, where

\[ P \]

is an invertible matrix whose columns are the eigenvectors of

\[ A \]

, and

\[ D \]

is a diagonal matrix whose diagonal entries are the corresponding eigenvalues.

  • A matrix is diagonalizable if and only if for every eigenvalue
\[ \lambda \]

, its geometric multiplicity (dimension of the eigenspace

\[ E_\lambda \]

) equals its algebraic multiplicity (multiplicity as a root of the characteristic polynomial).

  • If an
\[ n \times n \]

matrix has

\[ n \]

distinct eigenvalues, it is diagonalizable.

  • Symmetric matrices are a special case where
\[ P \]

can be chosen to be orthogonal (

\[ Q \]

), so

\[ A=Q\Lambda Q^T \]

.

7. Positive Definite and Semi-definite Matrices

These definitions apply to symmetric matrices

\[ A \in \mathbb{R}^{n \times n} \]

. The quadratic form associated with

\[ A \]

is

\[ \mathbf{x}^T A \mathbf{x} \]

.

  • **Positive Definite (
\[ A \succ 0 \]

):**

\[ \mathbf{x}^T A \mathbf{x} > 0 \quad \text{for all non-zero vectors } \mathbf{x} \in \mathbb{R}^n \]
  • Equivalent conditions:
    1. All eigenvalues of
\[ A \]

are strictly positive (

\[ \lambda_i > 0 \]

for all

\[ i \]

). 2. All leading principal minors of

\[ A \]

are strictly positive. (A leading principal minor of order

\[ k \]

is the determinant of the top-left

\[ k \times k \]

submatrix). 3. Cholesky decomposition

\[ A=LL^T \]

exists where

\[ L \]

is a lower triangular matrix with strictly positive diagonal entries.

  • **Positive Semi-definite (
\[ A \succeq 0 \]

):**

\[ \mathbf{x}^T A \mathbf{x} \ge 0 \quad \text{for all vectors } \mathbf{x} \in \mathbb{R}^n \]
  • Equivalent conditions:
    1. All eigenvalues of
\[ A \]

are non-negative (

\[ \lambda_i \ge 0 \]

for all

\[ i \]

). 2. All principal minors of

\[ A \]

are non-negative. (A principal minor is the determinant of a submatrix obtained by deleting the same set of rows and columns).

  • **Negative Definite (
\[ A \prec 0 \]

):**

\[ \mathbf{x}^T A \mathbf{x} < 0 \]

for all non-zero

\[ \mathbf{x} \]

. (Equivalent to

\[ -A \succ 0 \]

, all eigenvalues

\[ <0 \]

).

  • **Negative Semi-definite (
\[ A \preceq 0 \]

):**

\[ \mathbf{x}^T A \mathbf{x} \le 0 \]

for all

\[ \mathbf{x} \]

. (Equivalent to

\[ -A \succeq 0 \]

, all eigenvalues

\[ \le 0 \]

).

  • Indefinite: The matrix
\[ A \]

is neither positive semi-definite nor negative semi-definite. This means the quadratic form

\[ \mathbf{x}^T A \mathbf{x} \]

can take both positive and negative values. (Equivalent to

\[ A \]

having at least one positive eigenvalue and at least one negative eigenvalue).

  • Connection to Convexity: For a twice-differentiable function
\[ f: \mathbb{R}^n \to \mathbb{R} \]

, its Hessian matrix

\[ \nabla^2 f(\mathbf{x}) \]

being positive semi-definite (positive definite) over a convex domain implies

\[ f \]

is convex (strictly convex) over that domain. This is fundamental in optimization.

8. Inner Product Spaces

Definition. Inner Product

An inner product on a real vector space

\[ V \]

is a function

\[ \langle \cdot, \cdot \rangle : V \times V \to \mathbb{R} \]

that for all vectors

\[ \mathbf{u}, \mathbf{v}, \mathbf{w} \in V \]

and any scalar

\[ c \in \mathbb{R} \]

satisfies the following properties:

  1. Symmetry:
\[ \langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle \]
  1. Linearity in the first argument:

    \[ \langle c\mathbf{u} + \mathbf{v}, \mathbf{w} \rangle = c\langle \mathbf{u}, \mathbf{w} \rangle + \langle \mathbf{v}, \mathbf{w} \rangle \]

    (Linearity in the second argument follows from symmetry:

\[ \langle \mathbf{u}, c\mathbf{v} + \mathbf{w} \rangle = c\langle \mathbf{u}, \mathbf{v} \rangle + \langle \mathbf{u}, \mathbf{w} \rangle \]

).

  1. Positive-definiteness:
\[ \langle \mathbf{v}, \mathbf{v} \rangle \ge 0 \]

, and

\[ \langle \mathbf{v}, \mathbf{v} \rangle = 0 \iff \mathbf{v} = \mathbf{0} \]

The standard dot product in

\[ \mathbb{R}^n \]

(

\[ \langle \mathbf{u}, \mathbf{v} \rangle = \mathbf{u}^T \mathbf{v} \]

) is an example of an inner product. Another example is a weighted inner product

\[ \langle \mathbf{u}, \mathbf{v} \rangle_W = \mathbf{u}^T W \mathbf{v} \]

where

\[ W \]

is a symmetric positive definite matrix. An inner product induces a norm defined by

\[ \Vert \mathbf{v} \Vert = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle} \]

.

  • Cauchy-Schwarz Inequality (general form):

    \[ \vert \langle \mathbf{u}, \mathbf{v} \rangle \vert \le \Vert \mathbf{u} \Vert \Vert \mathbf{v} \Vert \]

    Equality holds if and only if

\[ \mathbf{u} \]

and

\[ \mathbf{v} \]

are linearly dependent.

  • Orthogonality: Two vectors
\[ \mathbf{u} \]

and

\[ \mathbf{v} \]

are orthogonal with respect to the inner product if

\[ \langle \mathbf{u}, \mathbf{v} \rangle = 0 \]

.

  • Orthonormal Basis: A basis
\[ \{\mathbf{q}_1, \dots, \mathbf{q}_n\} \]

for an

\[ n \]

-dimensional inner product space is orthonormal if

\[ \langle \mathbf{q}_i, \mathbf{q}_j \rangle = \delta_{ij} \]

(Kronecker delta: 1 if

\[ i=j \]

, 0 if

\[ i \ne j \]

).

  • Gram-Schmidt Process: An algorithm that takes any basis for an inner product space and constructs an orthonormal basis for that space.
  • Parseval’s Identity: If
\[ \{\mathbf{q}_1, \dots, \mathbf{q}_n\} \]

is an orthonormal basis for an inner product space

\[ V \]

, then for any vector

\[ \mathbf{v} \in V \]

:

\[ \mathbf{v} = \sum_{i=1}^n \langle \mathbf{v}, \mathbf{q}_i \rangle \mathbf{q}_i \quad \text{and} \quad \Vert \mathbf{v} \Vert^2 = \sum_{i=1}^n \vert \langle \mathbf{v}, \mathbf{q}_i \rangle \vert^2 \]

9. Matrix Calculus

Common derivatives using numerator layout convention. Result dimensions match the variable being differentiated with respect to (e.g., if differentiating w.r.t a column vector

\[ \mathbf{x} \]

, result is a column vector).

| Function

\[ f \]
1
                                           | Derivative                                                                  | Notes                                                                                                                                       | | ------------------------------------------------------------ | --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | | 
\[ \mathbf{a}^T \mathbf{x} \]

(or

\[ \mathbf{x}^T \mathbf{a} \]

) |

\[ \frac{\partial f}{\partial \mathbf{x}} = \mathbf{a} \]
1
                 | 
\[ \mathbf{x}, \mathbf{a} \in \mathbb{R}^n \]

. Result is

\[ n \times 1 \]

. | |

\[ \mathbf{x}^T A \mathbf{x} \]
1
                            | 
\[ \frac{\partial f}{\partial \mathbf{x}} = (A + A^T)\mathbf{x} \]
1
        | 
\[ A \in \mathbb{R}^{n \times n} \]

. If

\[ A \]

symmetric,

\[ 2A\mathbf{x} \]

. Result is

\[ n \times 1 \]

. | |

\[ \Vert A\mathbf{x} - \mathbf{b} \Vert_2^2 \]
1
             | 
\[ \frac{\partial f}{\partial \mathbf{x}} = 2A^T(A\mathbf{x} - \mathbf{b}) \]

|

\[ A \in \mathbb{R}^{m \times n}, \mathbf{x} \in \mathbb{R}^n, \mathbf{b} \in \mathbb{R}^m \]

. Result is

\[ n \times 1 \]

. | |

\[ \text{tr}(AX) \]
1
                                        | 
\[ \frac{\partial f}{\partial X} = A^T \]
1
                                 | 
\[ X \in \mathbb{R}^{k \times l}, A \in \mathbb{R}^{p \times k} \]

. Result is

\[ k \times l \]

(shape of

\[ X \]

). | |

\[ \text{tr}(XA) \]
1
                                        | 
\[ \frac{\partial f}{\partial X} = A^T \]
1
                                 | 
\[ X \in \mathbb{R}^{k \times l}, A \in \mathbb{R}^{l \times p} \]

. Result is

\[ k \times l \]

(shape of

\[ X \]

). | |

\[ \text{tr}(AXB) \]
1
                                       | 
\[ \frac{\partial f}{\partial X} = A^T B^T \]
1
                             | 
\[ X \in \mathbb{R}^{k \times l}, A \in \mathbb{R}^{p \times k}, B \in \mathbb{R}^{l \times q} \]

. Result is

\[ k \times l \]

(shape of

\[ X \]

). | |

\[ \text{tr}(X^T A X) \]
1
                                   | 
\[ \frac{\partial f}{\partial X} = (A+A^T)X \]
1
                            | 
\[ X \in \mathbb{R}^{k \times l}, A \in \mathbb{R}^{k \times k} \]

. If

\[ A \]

symmetric,

\[ 2AX \]

. Result is

\[ k \times l \]

(shape of

\[ X \]

). | |

\[ \log \det(X) \]
1
                                         | 
\[ \frac{\partial f}{\partial X} = (X^{-1})^T = X^{-T} \]
1
                 | 
\[ X \in \mathbb{R}^{n \times n} \]

positive definite. If

\[ X \]

symmetric,

\[ X^{-1} \]

. Result is

\[ n \times n \]

. | |

\[ \det(X) \]

(Jacobi’s Formula) |

\[ \frac{\partial f}{\partial X} = \det(X) (X^{-1})^T = \text{adj}(X)^T \]
1
| 
\[ X \in \mathbb{R}^{n \times n} \]

invertible. If

\[ X \]

symmetric,

\[ \det(X)X^{-1} \]

. Result is

\[ n \times n \]

. |

10. Miscellaneous Identities and Facts

  • Woodbury Matrix Identity (Matrix Inversion Lemma): Allows efficient computation of the inverse of a rank-
\[ k \]

corrected matrix:

\[ (A + UCV)^{-1} = A^{-1} - A^{-1}U(C^{-1} + VA^{-1}U)^{-1}VA^{-1} \]

where

\[ A \in \mathbb{R}^{n \times n} \]

,

\[ U \in \mathbb{R}^{n \times k} \]

,

\[ C \in \mathbb{R}^{k \times k} \]

,

\[ V \in \mathbb{R}^{k \times n} \]

. Assumes

\[ A, C \]

and

\[ (C^{-1} + VA^{-1}U) \]

are invertible.

  • Sherman-Morrison Formula (rank-1 update): A special case of the Woodbury identity where
\[ k=1 \]

(i.e.,

\[ U=\mathbf{u} \]

,

\[ V=\mathbf{v}^T \]

,

\[ C=1 \]

):

\[ (A + \mathbf{u}\mathbf{v}^T)^{-1} = A^{-1} - \frac{A^{-1}\mathbf{u}\mathbf{v}^T A^{-1}}{1 + \mathbf{v}^T A^{-1}\mathbf{u}} \]

Assumes

\[ A \]

is invertible and

\[ 1 + \mathbf{v}^T A^{-1}\mathbf{u} \ne 0 \]

.

  • Projection Matrix:
    • The matrix that orthogonally projects vectors onto the column space
\[ \text{Col}(A) \]

of a matrix

\[ A \in \mathbb{R}^{m \times n} \]

with linearly independent columns (i.e.,

\[ \text{rank}(A)=n \]

) is:

1
2
<div class="math-block" markdown="0"> \[ P_A = A(A^T A)^{-1} A^T \]
</div>
  • For any
\[ \mathbf{b} \in \mathbb{R}^m \]

,

\[ P_A\mathbf{b} \]

is the vector in

\[ \text{Col}(A) \]

closest to

\[ \mathbf{b} \]

.

  • Properties of orthogonal projection matrices:
\[ P_A^2 = P_A \]

(idempotent) and

\[ P_A^T = P_A \]

(symmetric).

  • **Relationship between
\[ A^TA \]

and

\[ AA^T \]

for

\[ A \in \mathbb{R}^{m \times n} \]

:**

  • Both
\[ A^TA \in \mathbb{R}^{n \times n} \]

and

\[ AA^T \in \mathbb{R}^{m \times m} \]

are symmetric and positive semi-definite.

  • They have the same non-zero eigenvalues. Their non-zero eigenvalues are the squares of the non-zero singular values of
\[ A \]

.

\[ \text{rank}(A) = \text{rank}(A^TA) = \text{rank}(AA^T) \]

.

  • If
\[ A \]

has full column rank (

\[ \text{rank}(A)=n \le m \]

), then

\[ A^TA \]

is positive definite (and thus invertible).

  • If
\[ A \]

has full row rank (

\[ \text{rank}(A)=m \le n \]

), then

\[ AA^T \]

is positive definite (and thus invertible). — This list is intended as a compact reference. For deeper understanding, proofs, and further topics, consult standard linear algebra textbooks or Wikipedia.

This post is licensed under CC BY 4.0 by the author.