Fiveable
Fiveable
Fiveable
Fiveable

Quasi-Newton methods are clever tricks to speed up optimization without the heavy lifting of Newton's method. They approximate the inverse Hessian matrix, using info from previous steps to get better and better guesses.

BFGS and DFP are two popular Quasi-Newton methods. They use different formulas to update the approximation, but both aim for superlinear convergence. BFGS is generally more robust and efficient in practice.

Quasi-Newton Methods Motivation

Approximating the Inverse Hessian

Top images from around the web for Approximating the Inverse Hessian
Top images from around the web for Approximating the Inverse Hessian
  • Quasi-Newton methods iteratively approximate the inverse Hessian matrix without explicit computation
  • Address computational cost of Newton's method by maintaining and updating an approximation at each iteration
  • Utilize information from previous iterations to improve the approximation
  • Aim to achieve superlinear convergence (faster than first-order methods, typically slower than Newton's quadratic convergence)
  • Require only first-order derivative information (gradients) for computational efficiency in large-scale problems

Key Principles and Relationships

  • Secant condition ensures approximation of the Hessian satisfies the same property as the true Hessian for the most recent step
  • Relationship to Newton's method involves using quadratic models of the objective function to determine search directions
  • Quasi-Newton methods use approximations instead of exact Hessians (Newton's method)
  • Examples of quasi-Newton methods include BFGS (Broyden-Fletcher-Goldfarb-Shanno) and DFP (Davidon-Fletcher-Powell)

BFGS and DFP Formulas

Derivation and Properties

  • BFGS update formula derived using weighted average of two rank-one updates
  • Combines information from current step and change in gradient
  • DFP update formula derived as sum of current approximation and two rank-one updates
  • Both formulas expressed in terms of current approximation, step vector, and difference in gradients between iterations
  • Sherman-Morrison-Woodbury formula crucial for efficient implementations
  • Allows direct updates of inverse Hessian approximation

Comparison of Update Formulas

  • BFGS and DFP updates satisfy secant condition
  • Maintain positive definiteness of approximation matrix
  • Ensure descent directions in optimization
  • Differ in weighting of information from current iteration
  • Lead to different numerical properties and performance characteristics
  • BFGS update: Bk+1=BkBkskskTBkskTBksk+ykykTykTskB_{k+1} = B_k - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k} + \frac{y_k y_k^T}{y_k^T s_k}
  • DFP update: Bk+1=BkBkskskTBkskTBksk+ykykTykTskB_{k+1} = B_k - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k} + \frac{y_k y_k^T}{y_k^T s_k} Where BkB_k is the current approximation, sks_k is the step vector, and yky_k is the difference in gradients

Implementing BFGS and DFP

Algorithm Structure

  • Initialize inverse Hessian approximation (typically identity matrix or scaled version)
  • Compute search direction: pk=Bkf(xk)p_k = -B_k \nabla f(x_k)
  • Perform line search to determine step size αk\alpha_k
  • Update solution: xk+1=xk+αkpkx_{k+1} = x_k + \alpha_k p_k
  • Update inverse Hessian approximation using BFGS or DFP formula
  • Efficient BFGS implementation uses Sherman-Morrison-Woodbury formula for direct updates
  • DFP update implemented directly on inverse Hessian approximation

Implementation Considerations

  • Careful implementation of line search procedures ensures sufficient decrease in objective function
  • Maintains positive definiteness of approximation
  • Numerical stability safeguards against division by zero
  • Maintains symmetry of approximation for robust implementations
  • Termination criteria involve checking norm of gradient, change in function value, or step size against predefined tolerances
  • Example termination condition: f(xk)<ϵ\|\nabla f(x_k)\| < \epsilon or f(xk)f(xk1)<δ|f(x_k) - f(x_{k-1})| < \delta

BFGS vs DFP Performance

Convergence Properties

  • BFGS and DFP exhibit superlinear convergence under certain conditions
  • BFGS converges faster and more reliably in practice
  • BFGS demonstrates better self-correcting properties
  • Recovers from poor approximations more effectively than DFP
  • Both methods sensitive to accuracy of line search
  • BFGS somewhat less sensitive than DFP

Practical Considerations

  • BFGS generally considered more robust and efficient than DFP
  • Particularly effective for problems with inaccurate line searches or ill-conditioned Hessians
  • Memory requirements similar for BFGS and DFP (O(n^2) storage for inverse Hessian approximation)
  • Numerical experiments show BFGS outperforms DFP in number of function evaluations and iterations to reach convergence
  • Example optimization problem: Rosenbrock function minimization
  • BFGS typically requires fewer iterations (20-30) compared to DFP (30-40) for convergence
  • Relative performance varies depending on specific optimization problem
  • Rare cases exist where DFP may outperform BFGS (highly nonlinear problems with specific structures)


© 2025 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2025 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary