Line search methods are crucial for unconstrained optimization. They find the best direction and step size to move towards the minimum. These methods balance between computational efficiency and convergence speed, making them practical for many real-world problems.
Steepest descent and Newton's method are two key line search approaches. Quasi-Newton methods, like BFGS, offer a middle ground. They use clever approximations to avoid costly calculations while still converging quickly in most cases.
Search Direction and Step Size
Fundamentals of Line Search Methods
- Search direction determines the path of optimization in multidimensional space
- Step size controls the magnitude of movement along the search direction
- Line search methods iterate between choosing a direction and determining an appropriate step size
- Optimization process continues until convergence criteria are met (gradient norm below threshold)
Steepest Descent and Newton's Method
- Steepest descent method uses negative gradient as search direction
- Computationally inexpensive but can be slow to converge
- Formula: dk=−∇f(xk)
- Performs well for functions with circular contours
- Newton's method utilizes both gradient and Hessian information
- Faster convergence rate, especially near the optimum
- Search direction given by: dk=−Hk−1∇f(xk)
- Requires computation and inversion of Hessian matrix
- Both methods update the current point using: xk+1=xk+αkdk
- $\alpha_k$ represents the step size
- Determined through line search procedures (exact or inexact)
Quasi-Newton Methods
Principles and Advantages
- Quasi-Newton methods approximate the Hessian matrix or its inverse
- Balance between computational efficiency and convergence speed
- Avoid explicit calculation and inversion of the Hessian matrix
- Update formula maintains positive definiteness of approximation
- Superlinear convergence rate in many practical applications
BFGS Method Implementation
- Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm widely used in practice
- Approximates the inverse Hessian matrix directly
- BFGS update formula: Bk+1=Bk+ykTskykykT−skTBkskBkskskTBk
- $B_k$ represents the current approximation of the inverse Hessian
- $s_k = x_{k+1} - x_k$ denotes the step taken
- $y_k = \nabla f(x_{k+1}) - \nabla f(x_k)$ measures the gradient change
- Limited-memory BFGS (L-BFGS) variant for large-scale problems
- Stores only a fixed number of vector pairs
- Reduces memory requirements for high-dimensional optimization tasks
Line Search Conditions
Armijo Rule and Sufficient Decrease
- Armijo rule ensures sufficient decrease in objective function value
- Condition: f(xk+αkdk)≤f(xk)+c1αk∇f(xk)Tdk
- $c_1$ typically set to a small value (0.0001 to 0.1)
- Prevents excessively large steps that may increase function value
- Often combined with backtracking to find suitable step size
Wolfe Conditions for Step Size Selection
- Wolfe conditions comprise two inequalities for step size selection
- Sufficient decrease condition (Armijo rule)
- Curvature condition: ∇f(xk+αkdk)Tdk≥c2∇f(xk)Tdk
- $c_2$ typically chosen between 0.1 and 0.9
- Strong Wolfe conditions use absolute value in curvature condition
- Ensure reasonable progress and avoid excessively small steps
Backtracking Line Search Algorithm