Automatic Stochastic Differentiation
Introduction
Calculus can be quite tedious when computed symbolically by hand. In many modern applications (for example, in machine learning), automatic differentiation is used to efficiently compute derivatives. The key idea is that when performing differentiation, only first‐order (linear) infinitesimal terms survive; higher-order infinitesimals vanish. In other words, if we denote an infinitesimal by
, then any term like
with
is negligible in the limit.
From the definition of the derivative,
we see that for each small increment, we only need to keep track of the function’s value and its linear change.
This is formalized by working in the ring of dual numbers,
where we think of
as our infinitesimal with the property
(analogous to
).
For example, consider the product of two dual numbers:
Since
, the term
vanishes, leaving
This naturally encodes the product rule:
.
In stochastic calculus, however, a key difference arises: the quadratic variation of a Brownian motion
is nonzero. In fact, if we define
then symbolically we have
This means that when differentiating functions of a Brownian path, second-order terms cannot be dropped as in the deterministic case. To capture this behavior algebraically, we use a larger number system. Instead of
, we work in
with the following interpretation:
- We set the “base differential”
(so that the linear term corresponds to the Brownian increment).
- Since
, we interpret
.
- All higher-order terms vanish:
.
Then, for a stochastic differential equation such as
we write
Functions
are differentiated by writing
and extracting the coefficients of
and
.
For example, using this framework you can derive Itô’s lemma.
Deriving Itô’s Lemma
Using
with
and writing
for a function
we have
A Taylor expansion up to
gives:
and since
this simplifies to
Thus, we identify
which, upon interpreting
as
and
as
, gives the familiar Itô’s lemma:
Stochastic Calculus as Algebra with Dual Numbers
In deterministic automatic differentiation we work with
(since
). For stochastic calculus, because of the nonzero quadratic variation of Brownian motion, we extend our system to
with:
(the Brownian increment), with the property that
,
(all higher-order terms vanish).
Then, as noted, the SDE
is encoded as
For a function
, the differential is computed as
and the coefficients of
and
give the
and
components respectively. This algebraic viewpoint immediately yields Itô’s lemma as shown above.
Stratonovich Equivalence
In many applications—especially in physics—the chain rule in its classical form is desirable. In Itô calculus the chain rule acquires an extra correction term. In Stratonovich calculus the definition is modified so that the chain rule holds as in ordinary calculus.
To motivate this, note that when using the Itô discretization, the Taylor expansion of a function
yields
with
so that the second-order term is nonzero. In order to recover the familiar chain rule,
we modify the increment by using a midpoint (or symmetric) evaluation. In our algebraic language this means that we define the stochastic increment in the Stratonovich sense as
Notice that now
When you Taylor expand
you obtain
Interpreting
as
and
as
, this means
By design, the Stratonovich integral is defined so that the chain rule appears in its usual form:
which means that the
component is taken to be simply
with no additional correction. In other words, the extra term
that appears in the Itô expansion is exactly canceled by the modification in the discretization scheme. This is why many texts write the conversion between the Itô and Stratonovich differentials as
where the extra term accounts for the curvature of the function
.
Why It’s Neat
Encoding both Itô and Stratonovich calculus in the algebraic system
(using
for
and
for
) makes stochastic calculus more accessible to computer algebra systems. In this unified framework the differences between the two interpretations are captured simply by a slight modification of the differential
. In the Stratonovich formulation one uses
which is equivalent to applying a midpoint rule. As a result, the usual (deterministic) chain rule holds without any extra correction.
Here is a complete example using Python and SymPy for the function
.
1
2
3
4
5
6
7
8
9
10
11
12
13
fromsympyimportsymbols,expand,collectX,mu,sigma=symbols('X mu sigma')e=symbols('e')# Itô increment: dX = sigma*e + mu*e**2
dX=sigma*e+mu*e**2X_dual=X+dXf_X=X**2f_X_dX=expand((X_dual)**2)df=f_X_dX-f_Xdf=collect(df,e).subs(e**3,0)# Truncate terms of order 3 and higher
print(df)# 2*X*sigma*e + (2*X*mu + sigma**2)*e**2
This outputs:
which we interpret as
matching Itô’s lemma (with the term
coming from
).
Stratonovich Example
For the Stratonovich version, we adjust the increment:
Then, for any smooth function
, a Taylor expansion yields
Interpreting
as
and
as
, the differential becomes
In the Stratonovich framework the chain rule is taken to hold in its usual form:
which implies that the
term is interpreted as
(i.e. without the extra
term). The difference between the Itô and Stratonovich interpretations is precisely the contribution of that extra term, which is “absorbed” by the modified discretization in the Stratonovich case.
In one dimension, notice that the derivative of
can be written as
so that
Thus, the conversion term
is equivalently written as
where
is simply the derivative of
in one dimension. This is a compact way to express the impact of the state-dependent noise amplitude on the drift when switching from the Itô to the Stratonovich interpretation.
Complete Dual Number Example
Below is a more complete Python example implementing dual numbers to derive Itô’s lemma for
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
importmathclassDualNumber:def__init__(self,real,e=0,e2=0):self.real=real# Standard real part
self.e=e# Coefficient for ε (Brownian increment)
self.e2=e2# Coefficient for ε² (time increment)
def__add__(self,other):ifisinstance(other,DualNumber):returnDualNumber(self.real+other.real,self.e+other.e,self.e2+other.e2)else:returnDualNumber(self.real+other,self.e,self.e2)def__radd__(self,other):returnself.__add__(other)def__mul__(self,other):ifisinstance(other,DualNumber):real=self.real*other.reale=self.real*other.e+self.e*other.real# Here we neglect terms of order 3 and higher (ε³=0)
e2=self.real*other.e2+self.e*other.e+self.e2*other.realreturnDualNumber(real,e,e2)else:returnDualNumber(self.real*other,self.e*other,self.e2*other)def__rmul__(self,other):returnself.__mul__(other)def__sub__(self,other):returnself+(-1*other)def__repr__(self):returnf"DualNumber({self.real} + {self.e} ε + {self.e2} ε²)"defexp(d):ifisinstance(d,DualNumber):a=d.realb=d.ec=d.e2exp_a=math.exp(a)real=exp_ae_term=exp_a*be2_term=exp_a*(b**2/2+c)returnDualNumber(real,e_term,e2_term)else:returnmath.exp(d)# Example usage: derive Itô's lemma for f(X)=X²
X=5.0# Current value of X
mu=2.0# Drift coefficient (dX = mu dt + sigma dB)
sigma=3.0# Diffusion coefficient
# Represent dX as a dual number: sigma ε + mu ε²
dX=DualNumber(0,sigma,mu)X_dual=DualNumber(X)+dX# X + dX
# Compute f(X + dX) = (X + dX)²
f_X_sq=X_dual*X_dual# Subtract f(X) to get df
df_dual=f_X_sq-DualNumber(X**2)print("Differential df:")print(f"df = ({df_dual.e2}) dt + ({df_dual.e}) dB")# Verify with Itô's formula
expected_dt=2*X*mu+sigma**2expected_dB=2*X*sigmaprint("\nExpected from Itô's lemma:")print(f"df = ({expected_dt}) dt + ({expected_dB}) dB")
The output is:
1
2
3
4
5
Differential df:
df = (29.0) dt + (30.0) dB
Expected from Itô's lemma:
df = (29.0) dt + (30.0) dB
This confirms that our dual number framework reproduces Itô’s lemma correctly.