Skip to content

Nonlinear Least Squares

Nonlinear least-squares problems arise in curve fitting, parameter estimation, and data assimilation. The goal is to minimize a sum of squared residuals:

minxRn  r(x)2=i=1mri(x)2,\min_{x \in \mathbb{R}^n} \; \|r(x)\|^2 = \sum_{i=1}^{m} r_i(x)^2,

where r:RnRmr: \mathbb{R}^n \to \mathbb{R}^m is a vector-valued residual function with mnm \ge n.

The Levenberg-Marquardt (LM) algorithm interpolates between Gauss-Newton and gradient descent using a damping parameter λ\lambda. At each iteration it solves:

(JJ+λdiag(JJ))Δx=Jr,\bigl(J^\top J + \lambda \, \text{diag}(J^\top J)\bigr) \, \Delta x = -J^\top r,

where JRm×nJ \in \mathbb{R}^{m \times n} is the Jacobian of the residual. The adaptive damping strategy uses the gain ratio ρ=(r(xk)2r(xk+Δx)2)/predicted\rho = (\|r(x_k)\|^2 - \|r(x_k + \Delta x)\|^2) / \text{predicted}:

  • If ρ>0\rho > 0 and the step reduces the cost: accept, decrease λλλdown\lambda \leftarrow \lambda \cdot \lambda_\text{down}.
  • Otherwise: reject, increase λλλup\lambda \leftarrow \lambda \cdot \lambda_\text{up}.

This adaptive scaling by diag(JJ)\text{diag}(J^\top J) rather than the identity matrix ensures scale invariance.

use numra::optim::{lm_minimize, LmOptions};
// Fit y = a * exp(b * t) to data
let t_data: Vec<f64> = (0..10).map(|i| i as f64).collect();
let y_data: Vec<f64> = t_data.iter()
.map(|&t| 2.0 * (-0.5 * t).exp())
.collect();
let m = t_data.len();
let residual = |p: &[f64], r: &mut [f64]| {
for i in 0..m {
r[i] = p[0] * (p[1] * t_data[i]).exp() - y_data[i];
}
};
let jacobian = |p: &[f64], jac: &mut [f64]| {
for i in 0..m {
let e = (p[1] * t_data[i]).exp();
jac[i * 2] = e; // dr_i/da
jac[i * 2 + 1] = p[0] * t_data[i] * e; // dr_i/db
}
};
let result = lm_minimize(
residual, jacobian, &[1.0, -1.0], m, &LmOptions::default()
).unwrap();
assert!(result.converged);
assert!((result.x[0] - 2.0).abs() < 1e-4); // a ~ 2.0
assert!((result.x[1] + 0.5).abs() < 1e-4); // b ~ -0.5
println!("Final cost: {:.2e}", result.f); // ||r||^2 ~ 0

The residual function writes into a buffer of length mm:

// residual: (params, output) -> ()
let residual = |p: &[f64], r: &mut [f64]| {
r[0] = ...; // first residual
r[1] = ...; // second residual
};

The Jacobian writes into a row-major buffer of length m×nm \times n:

// jacobian: (params, output) -> ()
// jac[i * n + j] = dr_i / dp_j
let jacobian = |p: &[f64], jac: &mut [f64]| {
jac[0 * 2 + 0] = ...; // dr_0/dp_0
jac[0 * 2 + 1] = ...; // dr_0/dp_1
jac[1 * 2 + 0] = ...; // dr_1/dp_0
// ...
};

A classic geometric fitting problem: given points on or near a circle, find the center (cx,cy)(c_x, c_y) and radius rr.

use numra::optim::{lm_minimize, LmOptions};
use std::f64::consts::PI;
let n_pts = 20;
let angles: Vec<f64> = (0..n_pts)
.map(|i| 2.0 * PI * i as f64 / n_pts as f64)
.collect();
let x_data: Vec<f64> = angles.iter().map(|a| a.cos()).collect();
let y_data: Vec<f64> = angles.iter().map(|a| a.sin()).collect();
let residual = |p: &[f64], r: &mut [f64]| {
let (cx, cy, rad) = (p[0], p[1], p[2]);
for i in 0..n_pts {
let dx = x_data[i] - cx;
let dy = y_data[i] - cy;
r[i] = (dx * dx + dy * dy).sqrt() - rad;
}
};
let jacobian = |p: &[f64], jac: &mut [f64]| {
let (cx, cy) = (p[0], p[1]);
for i in 0..n_pts {
let dx = x_data[i] - cx;
let dy = y_data[i] - cy;
let dist = (dx * dx + dy * dy).sqrt();
jac[i * 3] = -dx / dist; // dr_i/dcx
jac[i * 3 + 1] = -dy / dist; // dr_i/dcy
jac[i * 3 + 2] = -1.0; // dr_i/dr
}
};
let result = lm_minimize(
residual, jacobian, &[0.5, 0.5, 0.5], n_pts, &LmOptions::default()
).unwrap();
assert!(result.converged);
assert!(result.x[0].abs() < 1e-6); // cx ~ 0
assert!(result.x[1].abs() < 1e-6); // cy ~ 0
assert!((result.x[2] - 1.0).abs() < 1e-6); // r ~ 1

The OptimProblem builder supports least-squares via .least_squares() and .jacobian():

use numra::optim::OptimProblem;
let t_data = vec![0.0, 1.0, 2.0];
let y_data = vec![1.0, 3.0, 5.0];
let result = OptimProblem::new(2)
.x0(&[0.0, 0.0])
.least_squares(3, move |p: &[f64], r: &mut [f64]| {
for i in 0..3 {
r[i] = p[0] * t_data[i] + p[1] - y_data[i];
}
})
.solve()
.unwrap();
assert!(result.converged);
assert!((result.x[0] - 2.0).abs() < 1e-4); // slope
assert!((result.x[1] - 1.0).abs() < 1e-4); // intercept

When no Jacobian is provided, the auto solver uses finite-difference Jacobians computed via central differences (2n2n residual evaluations per Jacobian).

OptionDefaultDescription
max_iter200Maximum iterations
gtol1e-10Gradient norm convergence tolerance Jr\lVert J^\top r\rVert
xtol1e-10Step size convergence tolerance
ftol1e-12Relative cost change tolerance
lambda_init1e-3Initial damping parameter
lambda_up10.0Damping increase factor on rejected step
lambda_down0.1Damping decrease factor on accepted step

The Gauss-Newton method is the special case λ=0\lambda = 0:

JJΔx=Jr.J^\top J \, \Delta x = -J^\top r.

This converges quadratically near a zero-residual solution but can diverge when JJJ^\top J is nearly singular or the residuals are large. Levenberg-Marquardt adds regularization to ensure global convergence:

PropertyGauss-NewtonLM (λ\lambda small)LM (λ\lambda large)
DirectionNewton-likeNewton-likeSteepest descent
ConvergenceQuadratic (near-zero residual)SuperlinearLinear
RobustnessCan divergeGlobal convergenceGlobal convergence

Numra’s LM implementation automatically adjusts λ\lambda to stay in the superlinear regime when possible.

The result.history field tracks convergence per iteration:

for rec in &result.history {
println!(
"iter {}: cost={:.4e}, ||J^T r||={:.2e}, lambda={:.2e}",
rec.iteration, rec.objective, rec.gradient_norm, rec.step_size
);
}

A healthy LM run shows the cost decreasing monotonically (on accepted steps) and the gradient norm shrinking. The step_size field tracks the current damping λ\lambda, which should decrease as the algorithm approaches the solution.