Skip to content

Global Optimization

Local optimizers (BFGS, L-BFGS, Nelder-Mead) converge to the nearest local minimum from their starting point. Global optimizers search the entire feasible region for the global minimum, making them essential for multimodal, non-convex problems.

Numra provides two population-based global optimizers: Differential Evolution (DE) and CMA-ES (Covariance Matrix Adaptation Evolution Strategy).

Differential Evolution maintains a population of NpN_p candidate solutions and evolves them through mutation, crossover, and selection. The specific strategy implemented is DE/rand/1/bin:

  1. Mutation: For each individual xix_i, pick three distinct random individuals xa,xb,xcx_a, x_b, x_c and form a donor vector:
vi=xa+F(xbxc),v_i = x_a + F \cdot (x_b - x_c),

where F[0,2]F \in [0, 2] is the differential weight.

  1. Binomial crossover: Create a trial vector uiu_i where each component is taken from viv_i with probability CRCR (crossover rate), otherwise from xix_i. At least one component is always from viv_i.

  2. Selection: Replace xix_i with uiu_i if f(ui)f(xi)f(u_i) \le f(x_i).

use numra::optim::{de_minimize, DEOptions};
// Rastrigin function: many local minima, global minimum at origin
let f = |x: &[f64]| {
let n = x.len() as f64;
10.0 * n + x.iter()
.map(|xi| xi * xi - 10.0 * (2.0 * std::f64::consts::PI * xi).cos())
.sum::<f64>()
};
let bounds = vec![(-5.12, 5.12); 2];
let opts = DEOptions {
max_generations: 1000,
..Default::default()
};
let result = de_minimize(f, &bounds, &opts).unwrap();
assert!(result.f < 1.0); // near global minimum f=0
println!("Best: x={:?}, f={:.6}", result.x, result.f);
println!("Generations: {}, f_evals: {}", result.iterations, result.n_feval);
use numra::optim::{de_minimize, DEOptions};
let f = |x: &[f64]| {
(1.0 - x[0]).powi(2) + 100.0 * (x[1] - x[0] * x[0]).powi(2)
};
let bounds = vec![(-5.0, 10.0), (-5.0, 10.0)];
let opts = DEOptions {
max_generations: 2000,
..Default::default()
};
let result = de_minimize(f, &bounds, &opts).unwrap();
assert!(result.f < 0.01);
assert!((result.x[0] - 1.0).abs() < 0.1);
assert!((result.x[1] - 1.0).abs() < 0.1);
OptionDefaultDescription
pop_sizeNone (= 10n)Population size
max_generations1000Maximum number of generations
differential_weight0.8Mutation scale factor F[0,2]F \in [0, 2]
crossover_rate0.9Crossover probability CR[0,1]CR \in [0, 1]
tol1e-12Convergence tolerance on fitness spread
seed42Random seed for reproducibility

Tuning guidelines:

  • Increase pop_size for highly multimodal landscapes.
  • Use F[0.5,1.0]F \in [0.5, 1.0] and CR[0.7,1.0]CR \in [0.7, 1.0] as a starting point.
  • Lower CRCR (e.g., 0.1—0.3) for separable functions.
  • The algorithm is deterministic given the same seed.

CMA-ES adapts a multivariate normal distribution N(mk,σk2Ck)\mathcal{N}(m_k, \sigma_k^2 C_k) to the landscape. At each generation it:

  1. Samples λ\lambda offspring from the current distribution.
  2. Ranks them by fitness.
  3. Recombines the best μ\mu to update the mean mk+1m_{k+1}.
  4. Updates the covariance matrix Ck+1C_{k+1} using evolution paths (cumulation) for rank-one and rank-μ\mu updates.
  5. Updates the global step size σk+1\sigma_{k+1} via cumulative step-size adaptation (CSA).

CMA-ES excels on ill-conditioned problems because the covariance adaptation learns the local curvature.

use numra::optim::{cmaes_minimize, CmaEsOptions};
// Ill-conditioned ellipsoid: f(x) = sum(10^{6*i/(n-1)} * x_i^2)
let n = 5;
let f = |x: &[f64]| {
x.iter().enumerate()
.map(|(i, xi)| 10.0_f64.powf(6.0 * i as f64 / (n as f64 - 1.0)) * xi * xi)
.sum::<f64>()
};
let bounds = vec![(-5.0, 5.0); n];
let opts = CmaEsOptions {
max_generations: 5000,
..Default::default()
};
let result = cmaes_minimize(f, &[3.0; n], &bounds, &opts).unwrap();
assert!(result.f < 1e-6);
println!("Solution: {:?}", result.x);
OptionDefaultDescription
pop_sizeNone (= 4 + floor(3*ln(n)))Population size λ\lambda
max_generations10000Maximum generations
sigma_init0.3Initial step size (fraction of domain)
tol1e-12Convergence tolerance
seed42Random seed
PropertyDECMA-ES
Population sizeLarge (10n)Small (4 + 3 ln n)
Per-generation costO(Npn)O(N_p \cdot n)O(λn+n2)O(\lambda \cdot n + n^2) for covariance
StrengthsRobust, simple tuningIll-conditioned, correlated
WeaknessesSlow on correlated problemsO(n2)O(n^2) memory for covariance
Bounds handlingNative (clipping)Native (clipping)
Gradient-freeYesYes
Best forGeneral multimodal, n100n \le 100Moderate dimension, ill-conditioned

The OptimProblem builder supports global optimization with .global(true):

use numra::optim::OptimProblem;
let result = OptimProblem::new(2)
.objective(|x: &[f64]| {
(1.0 - x[0]).powi(2) + 100.0 * (x[1] - x[0] * x[0]).powi(2)
})
.all_bounds(&[(-5.0, 5.0), (-5.0, 5.0)])
.global(true)
.solve()
.unwrap();
println!("x = {:?}, f = {:.6}", result.x, result.f);

When .global(true) is set, the auto solver dispatches to Differential Evolution. Custom DE options can be provided with .de_options(opts).

Both DE and CMA-ES are deterministic given the same seed:

use numra::optim::{de_minimize, DEOptions};
let f = |x: &[f64]| x[0] * x[0] + x[1] * x[1];
let bounds = vec![(-5.0, 5.0); 2];
let r1 = de_minimize(f, &bounds, &DEOptions::default()).unwrap();
let r2 = de_minimize(f, &bounds, &DEOptions::default()).unwrap();
assert_eq!(r1.x, r2.x);
assert_eq!(r1.f, r2.f);

This is important for debugging and scientific reproducibility. Change the seed to explore different search trajectories.

  1. Start with DE for general multimodal problems. It requires only bounds (no initial point) and has few tuning parameters.

  2. Switch to CMA-ES if the problem is poorly conditioned (condition number > 100) or the variables are correlated.

  3. Refine with a local solver: After global search, polish the result with BFGS or L-BFGS for maximum accuracy:

use numra::optim::{de_minimize, bfgs_minimize, DEOptions, OptimOptions};
// Phase 1: Global search
let global = de_minimize(f, &bounds, &DEOptions::default()).unwrap();
// Phase 2: Local refinement
let local = bfgs_minimize(f, grad, &global.x, &OptimOptions::default()).unwrap();

This two-phase approach combines the global exploration of DE with the superlinear convergence of BFGS.