Variational Calculus Part 1: Functionals and the Quest for Optimal Functions

An introduction to variational calculus: exploring functionals, the challenge of optimizing entire functions, and developing the concept of the first variation as a 'derivative' for functionals.

Posted May 16, 2025

By Ji-Ha Kim

9 min read

Welcome to our crash course on Variational Calculus! In standard calculus, a cornerstone of optimization is finding points where the derivative of a function vanishes.

Principle. Stationary Points in Single-Variable Calculus
For a ‘nice’ (e.g., continuously differentiable) function \(f(x)\), its local maxima and minima occur at points \(x_0\) where its derivative is zero, i.e., \(f'(x_0)=0\). These are called stationary or critical points.

But what if the quantity we want to optimize isn’t determined by a single variable \(x\), but by an entire function or path? Consider questions like:

What is the shortest path between two points in a plane?
What path does a ray of light take through a medium with a continuously varying refractive index? (Fermat’s Principle of Least Time)
What is the shape of a soap film spanning a given wire frame that minimizes surface area?

These problems require us to optimize a value that depends on the choice of a function. This is the realm of variational calculus. This field provides the tools to find functions that make certain quantities (called functionals) stationary. It’s a powerful framework with deep roots in physics and engineering, and increasingly relevant in machine learning for understanding regularization, optimal control, and energy-based models.

In this first post, we will:

Introduce functionals through motivating examples.
Frame the core problem: optimizing functionals.
Develop the concept of a variation of a function, which allows us to define a “derivative” for functionals, leading to the first variation.

Assumption: ‘Nice’ Cases
Throughout this crash course, we will generally assume that all functions involved are sufficiently “nice” – meaning they are smooth (continuously differentiable as many times as needed) and well-behaved, allowing us to avoid pathological exceptions and focus on the core concepts. This is a common practice to make the introduction to the subject more accessible.

1. From Functions to Functionals: Motivating Examples

Let’s start by seeing how problems naturally lead to the concept of a functional.

Example 1. Shortest Path Between Two Points
Given two points \((x_1, y_1)\) and \((x_2, y_2)\) in a plane, we want to find the curve \(y(x)\) connecting them that has the shortest possible length. We know intuitively that the answer is a straight line. But how can we derive this formally, especially if the problem were more complex?
The length \(L\) of a curve \(y(x)\) from \(x=x_1\) to \(x=x_2\) is given by the arc length formula from calculus:
\[L = \int_{x_1}^{x_2} \sqrt{1 + (y'(x))^2} \, dx\]
Notice what’s happening here:
The input is an entire function \(y(x)\) (representing a path).
The output \(L\) is a single real number (the length of that path).
Different functions \(y(x)\) (different paths) will generally yield different lengths \(L\). This “function of a function” is what we call a functional. We denote it as \(L[y]\) or \(L[y(x)]\) to emphasize that its argument is a function.

Example 2. Fermat’s Principle of Least Time (Simplified)
Imagine light traveling from point A to point B through a medium where its speed \(v\) can vary depending on position. For instance, let speed depend only on the horizontal coordinate \(x\), so \(v = v(x)\). Fermat’s principle states that light takes the path that minimizes travel time.
If the path is described by a function \(y(x)\), an infinitesimal segment of the path has length \(ds = \sqrt{dx^2 + dy^2} = \sqrt{1 + (y'(x))^2} \, dx\). The time taken to traverse this segment is \(dt = ds / v(x)\). The total time \(T\) to travel from \(x=a\) to \(x=b\) along the path \(y(x)\) is:
\[T[y] = \int_a^b \frac{\sqrt{1 + (y'(x))^2}}{v(x)} \, dx\]
Again, the input is the path function \(y(x)\), and the output \(T[y]\) is the total travel time (a scalar). Unlike the straight-line case, the minimizing path here is generally not obvious and depends on the form of \(v(x)\). This highlights the need for a systematic method to find such optimal functions.

These examples lead us to a general definition:

Definition. Functional
A functional \(J\) is a mapping from a vector space of functions \(\mathcal{Y}\) (the space of admissible functions) to the real numbers \(\mathbb{R}\). If \(y(x)\) is a function in \(\mathcal{Y}\), then \(J[y]\) denotes the scalar value of the functional evaluated at \(y\).
The square brackets \(J[y]\) are conventionally used to distinguish functionals from ordinary functions \(f(x)\).

Single-Variable Calculus vs. Calculus of Variations
Single-Variable Calculus: Optimizes a function \(f(x)\) where \(x\) is a variable (a point).
Calculus of Variations: Optimizes a functional \(J[y]\) where \(y\) is a function (a curve, path, etc.).
</prompt-tip>
2. The Core Problem: Optimizing Functionals
The fundamental goal of variational calculus is to find a function \(y_0(x)\) from a specified class of admissible functions that makes a given functional \(J[y]\) stationary (i.e., a minimum, maximum, or saddle point).
This is analogous to finding critical points in ordinary calculus, but the “variable” we are optimizing over is now an entire function, which can be thought of as a point in an infinite-dimensional function space.
3. Developing the “Derivative”: The Concept of Variations
How do we find the “derivative” of a functional \(J[y]\) with respect to the function \(y(x)\)? This is the key question. We can’t just differentiate with respect to \(y\) as if it were a simple variable, because \(y\) is a function, and the functional \(J[y]\) often depends not only on \(y(x)\) at a point \(x\) but also on its derivatives like \(y'(x)\), and values across an entire interval via an integral.
Let’s draw inspiration from how derivatives are defined. The derivative of \(f(x)\) tells us how \(f\) changes for an infinitesimal change in \(x\). For functionals, we need to see how \(J[y]\) changes when we make a small “perturbation” or “variation” to the entire function \(y(x)\).
Consider a candidate function \(y(x)\) that we suspect might extremize \(J[y]\). We create a “nearby” or “varied” function \(\tilde{y}(x)\) by adding a small, arbitrary perturbation:
\[\tilde{y}(x; \epsilon) = y(x) + \epsilon \eta(x)\]
Let’s break this down:
\(y(x)\): The function we are testing for extremality.
\(\eta(x)\): An arbitrary, sufficiently smooth function called the variation function or test function. It represents the “direction” in the space of functions along which we are perturbing \(y(x)\).
\(\epsilon\): A small real number (a scalar parameter). As \(\epsilon \to 0\), the perturbed function \(\tilde{y}(x; \epsilon)\) approaches \(y(x)\).
Boundary Conditions for Variations: If the problem requires \(y(x)\) to satisfy fixed boundary conditions, say \(y(a) = y_a\) and \(y(b) = y_b\), then any admissible perturbed function \(\tilde{y}(x; \epsilon)\) must also satisfy these same boundary conditions for all \(\epsilon\). Since \(y(x) + \epsilon \eta(x)\) must equal \(y_a\) at \(x=a\), and \(y(a)=y_a\), this implies \(\epsilon \eta(a) = 0\). Similarly, \(\epsilon \eta(b) = 0\). For these to hold for any small non-zero \(\epsilon\), the variation function \(\eta(x)\) must itself vanish at the boundaries:
\[\eta(a) = 0 \quad \text{and} \quad \eta(b) = 0\]
Such an \(\eta(x)\) is called an admissible variation for problems with fixed endpoints.

Analogy. Directional Derivatives in Multivariable Calculus
This approach is very similar to how directional derivatives are defined for a multivariable function \(f(\mathbf{x})\) where \(\mathbf{x} \in \mathbb{R}^n\). To find the rate of change of \(f\) at \(\mathbf{x}_0\) in the direction of a vector \(\mathbf{v}\), we consider the function \(g(\epsilon) = f(\mathbf{x}_0 + \epsilon \mathbf{v})\). The directional derivative is then \(g'(0)\).
In our case:
The function \(y(x)\) is analogous to the point \(\mathbf{x}_0\).
The variation function \(\eta(x)\) is analogous to the direction vector \(\mathbf{v}\).
The scalar \(\epsilon\) is analogous to the step size.
The functional \(J[y]\) is analogous to \(f(\mathbf{x})\).
Now, if we substitute \(\tilde{y}(x; \epsilon) = y(x) + \epsilon \eta(x)\) into the functional \(J[y]\), the value of the functional becomes a regular function of the single real variable \(\epsilon\) (assuming \(y\) and \(\eta\) are fixed):
\[\Phi(\epsilon) = J[y + \epsilon \eta]\]
If \(y(x)\) is indeed an extremizing function for \(J[y]\), then for any choice of admissible variation \(\eta(x)\), the function \(\Phi(\epsilon)\) must have a stationary point at \(\epsilon = 0\). From ordinary calculus, this means its derivative with respect to \(\epsilon\) must be zero at \(\epsilon = 0\):
\[\left. \frac{d\Phi(\epsilon)}{d\epsilon} \right\vert_{\epsilon=0} = \left. \frac{d}{d\epsilon} J[y + \epsilon \eta] \right\vert_{\epsilon=0} = 0\]
This derivative is precisely what we call the first variation of the functional \(J\) at \(y\) in the “direction” \(\eta\).
Definition. The First Variation (or Gâteaux Derivative)
The first variation of a functional \(J[y]\) at the function \(y\) with respect to a variation function \(\eta(x)\) (often denoted \(\delta y = \epsilon \eta\) for infinitesimal \(\epsilon\)) is given by:
\[\delta J[y; \eta] = \left. \frac{d}{d\epsilon} J[y + \epsilon \eta] \right\vert_{\epsilon=0}\]
A necessary condition for \(y(x)\) to be an extremizer of \(J[y]\) (among functions satisfying the given boundary conditions) is that its first variation must be zero for all admissible variation functions \(\eta(x)\):
\[\delta J[y; \eta] = 0 \quad \text{for all admissible } \eta(x)\]
Remark. The notation \(\delta J\) is common. Sometimes it refers to \(\delta J[y; \eta]\) (the derivative itself), and sometimes it informally refers to the principal linear part of the change \(\Delta J = J[y+\epsilon\eta] - J[y] \approx \epsilon \cdot \delta J[y;\eta]\). The definition using the derivative with respect to \(\epsilon\) is the most practical for calculations.
This condition, \(\delta J[y; \eta] = 0\) for all admissible \(\eta\), is the cornerstone of variational calculus. It’s the direct analogue of \(f'(x)=0\) for finding extrema of ordinary functions. The power of this condition comes from the requirement that it must hold for every possible (admissible) way of varying the function.
4. What’s Next?
We’ve established what functionals are and introduced the first variation as a way to detect stationary “points” (which are actually functions!) in the landscape defined by a functional. The crucial necessary condition for an extremum is \(\delta J[y; \eta] = 0\) for all admissible \(\eta(x)\).
But how do we use this condition? It still seems abstract. In the next post, we will:
Consider the common form of functionals: \(J[y] = \int_a^b F(x, y(x), y'(x)) \, dx\).
Explicitly calculate \(\delta J[y; \eta]\) for such functionals.
Use the fact that \(\delta J = 0\) must hold for all admissible \(\eta(x)\), along with a key result called the Fundamental Lemma of Variational Calculus, to derive the famous Euler-Lagrange equation.
The Euler-Lagrange equation is a differential equation that the extremizing function \(y(x)\) must satisfy. Solving this differential equation (subject to boundary conditions) will give us the candidate functions that make the functional stationary.
Stay tuned as we turn this abstract condition into a concrete computational tool!

Crash Course, Calculus

This post is licensed under CC BY 4.0 by the author.

1. From Functions to Functionals: Motivating Examples

2. The Core Problem: Optimizing Functionals

3. Developing the “Derivative”: The Concept of Variations

4. What’s Next?

Trending Tags