Skip to content

Adjoint Methods

The adjoint method computes the gradient of an objective function with respect to parameters in O(1)O(1) backward integrations, regardless of the number of parameters. This makes it dramatically more efficient than forward sensitivity when there are many parameters (as in machine learning or large-scale parameter estimation).

Given a parameterized ODE:

dxdt=f(t,x,p),x(t0)=x0\frac{dx}{dt} = f(t, x, p), \quad x(t_0) = x_0

and an objective:

J=ϕ(x(tf))+t0tfL(t,x,p)dtJ = \phi(x(t_f)) + \int_{t_0}^{t_f} L(t, x, p) \, dt

compute the gradient dJ/dpdJ/dp.

Forward sensitivity propagates derivatives forward alongside the state:

ddtxp=fxxp+fp\frac{d}{dt}\frac{\partial x}{\partial p} = \frac{\partial f}{\partial x} \frac{\partial x}{\partial p} + \frac{\partial f}{\partial p}

This requires solving nx×npn_x \times n_p additional equations — expensive when npn_p is large.

Adjoint method introduces the costate λ(t)\lambda(t) and integrates backward:

dλdt=(fx)Tλ(Lx)T\frac{d\lambda}{dt} = -\left(\frac{\partial f}{\partial x}\right)^T \lambda - \left(\frac{\partial L}{\partial x}\right)^T

with terminal condition λ(tf)=xϕ(x(tf))\lambda(t_f) = \nabla_x \phi(x(t_f)).

The gradient is then:

dJdp=pϕ+t0tf[(fp)Tλ+Lp]dt\frac{dJ}{dp} = \nabla_p \phi + \int_{t_0}^{t_f} \left[\left(\frac{\partial f}{\partial p}\right)^T \lambda + \frac{\partial L}{\partial p}\right] dt

This requires only one backward integration regardless of npn_p.

use numra::ocp::adjoint_gradient;
// Model: dx/dt = -p[0]*x + p[1]*sin(t)
let gradient_result = adjoint_gradient(
// model: f(t, x, dxdt, params)
|t, x, dxdt, p| {
dxdt[0] = -p[0] * x[0] + p[1] * t.sin();
},
1, // n_states
2, // n_params
0.0, // t0
10.0, // tf
&[1.0], // x0
&[0.5, 1.0], // params
// terminal cost: phi(x(tf))
|x_tf| x_tf[0] * x_tf[0],
// running cost (optional): L(t, x, p)
Some(|_t: f64, _x: &[f64], _p: &[f64]| 0.0),
).unwrap();
println!("Objective: {:.6}", gradient_result.objective);
println!("dJ/dp = {:?}", gradient_result.gradient);
MethodCostBest when
Forward sensitivityO(nx×np)O(n_x \times n_p)npnxn_p \leq n_x
AdjointO(nx)O(n_x) backwardnpnxn_p \gg n_x
Finite differencesO(np)O(n_p) forward solvesQuick & dirty

For a 3-state system with 100 parameters:

  • Forward: 300 additional ODEs
  • Adjoint: 3 backward ODEs + gradient quadrature
  • Finite differences: 100 full ODE solves
FieldDescription
gradientdJ/dpdJ/dp vector
objectiveScalar objective JJ
costateCostate trajectory λ(t)\lambda(t)
costate_timeTime points for costate
  • Jacobians f/x\partial f/\partial x and f/p\partial f/\partial p are computed via finite differences internally.
  • The forward state trajectory must be stored or recomputed during the backward pass. Numra stores the trajectory from the forward integration.
  • For problems with discontinuous dynamics, the adjoint equations need jump conditions at discontinuity points.