Unconstrained Optimization
This chapter covers algorithms for solving
without any constraints on the decision variables. Numra provides four unconstrained minimizers spanning the spectrum from gradient-based quasi-Newton methods to derivative-free direct search.
BFGS Quasi-Newton Method
Section titled “BFGS Quasi-Newton Method”The Broyden—Fletcher—Goldfarb—Shanno (BFGS) algorithm maintains a dense approximation to the inverse Hessian and updates it each iteration with a rank-2 correction:
where and . A Wolfe line search determines the step length at each iteration.
Functional API
Section titled “Functional API”use numra::optim::{bfgs_minimize, OptimOptions};
// Rosenbrock function: f(x) = (1 - x0)^2 + 100*(x1 - x0^2)^2let f = |x: &[f64]| { (1.0 - x[0]).powi(2) + 100.0 * (x[1] - x[0] * x[0]).powi(2)};let grad = |x: &[f64], g: &mut [f64]| { g[0] = -2.0 * (1.0 - x[0]) - 400.0 * x[0] * (x[1] - x[0] * x[0]); g[1] = 200.0 * (x[1] - x[0] * x[0]);};
let opts = OptimOptions::default().gtol(1e-8).max_iter(2000);let result = bfgs_minimize(f, grad, &[-1.0, 1.0], &opts).unwrap();
assert!(result.converged);assert!((result.x[0] - 1.0).abs() < 1e-5);assert!((result.x[1] - 1.0).abs() < 1e-5);println!("Iterations: {}, f_evals: {}", result.iterations, result.n_feval);Struct API
Section titled “Struct API”The Bfgs struct wraps the same algorithm with an object-oriented interface:
use numra::optim::{Bfgs, OptimOptions};
let bfgs = Bfgs::new(OptimOptions::default().gtol(1e-10));let result = bfgs.minimize( |x: &[f64]| x[0] * x[0] + x[1] * x[1], |x: &[f64], g: &mut [f64]| { g[0] = 2.0 * x[0]; g[1] = 2.0 * x[1]; }, &[3.0, 4.0],).unwrap();
assert!(result.converged);Convergence Properties
Section titled “Convergence Properties”BFGS converges superlinearly on smooth strongly convex functions. On the -dimensional Rosenbrock it typically converges in iterations regardless of , because the inverse Hessian approximation captures the curvature of the narrow valley.
Storage cost: for the dense inverse Hessian. For , use L-BFGS instead.
L-BFGS (Limited-Memory BFGS)
Section titled “L-BFGS (Limited-Memory BFGS)”L-BFGS replaces the dense inverse Hessian with a compact representation using the most recent correction pairs . The search direction is computed via the two-loop recursion in time and storage:
where with providing automatic scaling.
use numra::optim::{lbfgs_minimize, LbfgsOptions};
// Large-scale quadratic: f(x) = sum(x_i^2), n = 100let n = 100;let f = |x: &[f64]| x.iter().map(|xi| xi * xi).sum::<f64>();let grad = |x: &[f64], g: &mut [f64]| { for i in 0..x.len() { g[i] = 2.0 * x[i]; }};
let x0: Vec<f64> = (1..=n).map(|i| i as f64 * 0.1).collect();let opts = LbfgsOptions::default().memory(10).gtol(1e-10);let result = lbfgs_minimize(f, grad, &x0, &opts).unwrap();
assert!(result.converged);assert!(result.f < 1e-10);Key Options
Section titled “Key Options”| Option | Default | Description |
|---|---|---|
memory | 10 | Number of correction pairs to store |
gtol | 1e-8 | Gradient norm convergence tolerance |
max_iter | 1000 | Maximum number of iterations |
ftol | 1e-12 | Relative function change tolerance |
When to use L-BFGS over BFGS: Use L-BFGS when . For small problems (), BFGS can converge in fewer iterations because the full inverse Hessian captures curvature more accurately.
Nelder-Mead Simplex Method
Section titled “Nelder-Mead Simplex Method”The Nelder-Mead method is a derivative-free optimizer that maintains a simplex of vertices and transforms it via four operations: reflection, expansion, contraction, and shrink.
Numra’s implementation uses adaptive parameters (Gao & Han, 2012) that scale with problem dimension:
These adaptive parameters significantly improve convergence in high dimensions compared to the classical fixed parameters .
use numra::optim::{nelder_mead, NelderMeadOptions};
// Beale function (non-convex, gradient not readily available)let f = |x: &[f64]| { (1.5 - x[0] + x[0] * x[1]).powi(2) + (2.25 - x[0] + x[0] * x[1] * x[1]).powi(2) + (2.625 - x[0] + x[0] * x[1].powi(3)).powi(2)};
let opts = NelderMeadOptions { max_iter: 50_000, fatol: 1e-8, xatol: 1e-8, adaptive: true, ..Default::default()};let result = nelder_mead(f, &[1.0, 1.0], &opts).unwrap();
assert!(result.converged);assert!(result.f < 1e-4);println!("Function evaluations: {}", result.n_feval);Key Options
Section titled “Key Options”| Option | Default | Description |
|---|---|---|
fatol | 1e-8 | Function value tolerance (convergence when spread < fatol) |
xatol | 1e-8 | Simplex size tolerance |
initial_simplex_scale | 0.05 | Scale factor for initial simplex construction |
adaptive | true | Use adaptive Gao—Han parameters |
max_iter | 10000 | Maximum iterations |
Convergence: Nelder-Mead converges linearly at best and can stall on degenerate problems. It uses zero gradient evaluations (n_geval = 0), making it ideal for black-box functions, noisy objectives, or non-differentiable problems.
Powell’s Conjugate Direction Method
Section titled “Powell’s Conjugate Direction Method”Powell’s method builds a set of conjugate directions by performing sequential 1-D line minimizations along each direction, then replaces the direction with the largest decrease with the net displacement direction. Each line minimization uses Brent’s method (combining parabolic interpolation with golden-section search).
use numra::optim::{powell, PowellOptions};
let f = |x: &[f64]| (x[0] - 2.0).powi(2) + (x[1] - 3.0).powi(2);let opts = PowellOptions::default();let result = powell(f, &[0.0, 0.0], &opts).unwrap();
assert!(result.converged);assert!((result.x[0] - 2.0).abs() < 1e-4);assert!((result.x[1] - 3.0).abs() < 1e-4);Key Options
Section titled “Key Options”| Option | Default | Description |
|---|---|---|
ftol | 1e-8 | Function change convergence tolerance |
xtol | 1e-8 | Step size convergence tolerance |
max_line_search_iter | 100 | Maximum iterations per Brent line search |
max_iter | 10000 | Maximum outer iterations |
Solver Comparison
Section titled “Solver Comparison”| Solver | Gradients | Storage | Convergence | Best For |
|---|---|---|---|---|
| BFGS | Required | Superlinear | Smooth, small-to-medium scale () | |
| L-BFGS | Required | Superlinear | Smooth, large scale () | |
| Nelder-Mead | None | Linear | Black-box, noisy, non-smooth | |
| Powell | None | Superlinear (quadratic) | Derivative-free, smooth |
Common Options: OptimOptions
Section titled “Common Options: OptimOptions”All gradient-based solvers share the OptimOptions struct:
use numra::optim::OptimOptions;
let opts = OptimOptions::default() .max_iter(2000) .gtol(1e-10) // gradient norm threshold .ftol(1e-14) // relative function change threshold .xtol(1e-14); // step size thresholdDefault values: max_iter = 1000, gtol = 1e-8, ftol = 1e-12, xtol = 1e-12.
Reading the OptimResult
Section titled “Reading the OptimResult”Every solver returns an OptimResult<S>:
let result = bfgs_minimize(f, grad, &x0, &opts).unwrap();
println!("Minimizer: {:?}", result.x);println!("Minimum: {:.6e}", result.f);println!("Gradient: {:?}", result.grad);println!("Converged: {}", result.converged);println!("Status: {:?}", result.status); // e.g., GradientConvergedprintln!("Iterations: {}", result.iterations);println!("f evaluations: {}", result.n_feval);println!("g evaluations: {}", result.n_geval);println!("Wall time: {:.3} s", result.wall_time_secs);
// Convergence history (objective, gradient norm per iteration)for rec in &result.history { println!( " iter {}: f={:.4e}, ||g||={:.2e}, alpha={:.2e}", rec.iteration, rec.objective, rec.gradient_norm, rec.step_size );}Convergence Status Variants
Section titled “Convergence Status Variants”OptimStatus | Meaning |
|---|---|
GradientConverged | gtol |
FunctionConverged | ftol |
StepConverged | xtol |
MaxIterations | Iteration limit reached without convergence |
LineSearchFailed | Wolfe line search could not find an acceptable step |