Fiveable
Fiveable
scoresvideos
Nonlinear Optimization
Table of Contents

Conjugate gradient methods combine steepest descent with conjugate directions for faster optimization. They generate search directions without storing large matrices, making them great for big problems. These methods are like a supercharged version of regular gradient descent.

Fletcher-Reeves and Polak-Ribière are two popular conjugate gradient techniques. They use different formulas to update search directions, with Polak-Ribière often performing better in practice. These methods are like turbo-boosters for optimization algorithms.

Conjugate Gradient Methods

Principles of Conjugate Directions

  • Conjugate directions form basis for efficient optimization algorithms
  • Vectors u and v conjugate with respect to matrix A if uTAv=0u^TAv = 0
  • Conjugate direction methods generate sequence of search directions mutually conjugate to each other
  • Optimize along each direction sequentially leads to faster convergence than steepest descent
  • Conjugate gradient method combines ideas from steepest descent and conjugate directions
  • Generates conjugate directions without explicitly storing matrix A
  • Particularly effective for large-scale problems with sparse matrices

Fletcher-Reeves and Polak-Ribière Methods

  • Fletcher-Reeves method calculates conjugate directions using gradient information
  • Update formula for search direction: dk+1=gk+1+βkFRdkd_{k+1} = -g_{k+1} + \beta_k^{FR} d_k
  • Fletcher-Reeves parameter: βkFR=gk+1Tgk+1gkTgk\beta_k^{FR} = \frac{g_{k+1}^T g_{k+1}}{g_k^T g_k}
  • Polak-Ribière method modifies Fletcher-Reeves approach for improved performance
  • Polak-Ribière parameter: βkPR=gk+1T(gk+1gk)gkTgk\beta_k^{PR} = \frac{g_{k+1}^T (g_{k+1} - g_k)}{g_k^T g_k}
  • Both methods generate conjugate directions without storing entire matrix
  • Polak-Ribière often performs better in practice due to its ability to self-correct

Advanced Conjugate Gradient Techniques

  • Hestenes-Stiefel method provides alternative update formula for conjugate directions
  • Hestenes-Stiefel parameter: βkHS=gk+1T(gk+1gk)dkT(gk+1gk)\beta_k^{HS} = \frac{g_{k+1}^T (g_{k+1} - g_k)}{d_k^T (g_{k+1} - g_k)}
  • Nonlinear conjugate gradient extends method to nonquadratic objective functions
  • Applies conjugate gradient ideas to general nonlinear optimization problems
  • Requires line search to determine step size along search direction
  • Various beta formulas available for nonlinear case (Fletcher-Reeves, Polak-Ribière, Hestenes-Stiefel)
  • Choice of beta formula affects convergence behavior and robustness

Convergence and Preconditioning

Quadratic Convergence Properties

  • Conjugate gradient method exhibits quadratic convergence for quadratic objective functions
  • Converges in at most n iterations for n-dimensional problem
  • Convergence rate depends on eigenvalue distribution of Hessian matrix
  • Clustered eigenvalues lead to faster convergence
  • Ill-conditioned problems with widely spread eigenvalues converge more slowly
  • Practical performance often better than worst-case bound suggests
  • Superlinear convergence observed for well-behaved nonlinear problems

Preconditioning Techniques

  • Preconditioning improves convergence of conjugate gradient method
  • Transforms original problem into equivalent problem with better conditioning
  • Preconditioner M approximates inverse of Hessian matrix
  • Preconditioned conjugate gradient solves system M1Ax=M1bM^{-1}Ax = M^{-1}b
  • Common preconditioners include diagonal scaling, incomplete Cholesky factorization
  • Jacobi preconditioner uses diagonal of matrix A
  • Symmetric successive over-relaxation (SSOR) preconditioner based on matrix splitting
  • Effective preconditioner balances improved convergence with computational cost

Restart Strategies and Implementation

  • Restart strategies periodically reset conjugate gradient algorithm
  • Helps maintain conjugacy in presence of rounding errors or nonlinearity
  • Full restart resets search direction to negative gradient every n iterations
  • Partial restart retains some information from previous iterations
  • Powell restart resets when consecutive gradients are nearly parallel
  • Limited-memory methods store small number of vectors to approximate Hessian
  • Implementation considerations include efficient matrix-vector products
  • Exploit sparsity structure of problem for computational efficiency
  • Termination criteria based on gradient norm or relative change in objective function