Conjugate gradient methods combine steepest descent with conjugate directions for faster optimization. They generate search directions without storing large matrices, making them great for big problems. These methods are like a supercharged version of regular gradient descent.
Fletcher-Reeves and Polak-Ribière are two popular conjugate gradient techniques. They use different formulas to update search directions, with Polak-Ribière often performing better in practice. These methods are like turbo-boosters for optimization algorithms.
Conjugate Gradient Methods
Principles of Conjugate Directions
- Conjugate directions form basis for efficient optimization algorithms
- Vectors u and v conjugate with respect to matrix A if uTAv=0
- Conjugate direction methods generate sequence of search directions mutually conjugate to each other
- Optimize along each direction sequentially leads to faster convergence than steepest descent
- Conjugate gradient method combines ideas from steepest descent and conjugate directions
- Generates conjugate directions without explicitly storing matrix A
- Particularly effective for large-scale problems with sparse matrices
Fletcher-Reeves and Polak-Ribière Methods
- Fletcher-Reeves method calculates conjugate directions using gradient information
- Update formula for search direction: dk+1=−gk+1+βkFRdk
- Fletcher-Reeves parameter: βkFR=gkTgkgk+1Tgk+1
- Polak-Ribière method modifies Fletcher-Reeves approach for improved performance
- Polak-Ribière parameter: βkPR=gkTgkgk+1T(gk+1−gk)
- Both methods generate conjugate directions without storing entire matrix
- Polak-Ribière often performs better in practice due to its ability to self-correct
Advanced Conjugate Gradient Techniques
- Hestenes-Stiefel method provides alternative update formula for conjugate directions
- Hestenes-Stiefel parameter: βkHS=dkT(gk+1−gk)gk+1T(gk+1−gk)
- Nonlinear conjugate gradient extends method to nonquadratic objective functions
- Applies conjugate gradient ideas to general nonlinear optimization problems
- Requires line search to determine step size along search direction
- Various beta formulas available for nonlinear case (Fletcher-Reeves, Polak-Ribière, Hestenes-Stiefel)
- Choice of beta formula affects convergence behavior and robustness
Convergence and Preconditioning
Quadratic Convergence Properties
- Conjugate gradient method exhibits quadratic convergence for quadratic objective functions
- Converges in at most n iterations for n-dimensional problem
- Convergence rate depends on eigenvalue distribution of Hessian matrix
- Clustered eigenvalues lead to faster convergence
- Ill-conditioned problems with widely spread eigenvalues converge more slowly
- Practical performance often better than worst-case bound suggests
- Superlinear convergence observed for well-behaved nonlinear problems
Preconditioning Techniques
- Preconditioning improves convergence of conjugate gradient method
- Transforms original problem into equivalent problem with better conditioning
- Preconditioner M approximates inverse of Hessian matrix
- Preconditioned conjugate gradient solves system M−1Ax=M−1b
- Common preconditioners include diagonal scaling, incomplete Cholesky factorization
- Jacobi preconditioner uses diagonal of matrix A
- Symmetric successive over-relaxation (SSOR) preconditioner based on matrix splitting
- Effective preconditioner balances improved convergence with computational cost
Restart Strategies and Implementation
- Restart strategies periodically reset conjugate gradient algorithm
- Helps maintain conjugacy in presence of rounding errors or nonlinearity
- Full restart resets search direction to negative gradient every n iterations
- Partial restart retains some information from previous iterations
- Powell restart resets when consecutive gradients are nearly parallel
- Limited-memory methods store small number of vectors to approximate Hessian
- Implementation considerations include efficient matrix-vector products
- Exploit sparsity structure of problem for computational efficiency
- Termination criteria based on gradient norm or relative change in objective function