Broyden’s Method in Python

In a previous post we looked at root-finding methods for single variable equations. In this post we’ll look at the expansion of Quasi-Newton methods to the multivariable case and look at one of the more widely-used algorithms today: Broyden’s Method.

optimization203

Broyden’s Good Method

Broyeden’s Method is, like the Secant Method and Brent’s Method, another attempt to use information about the derivatives without having to explicitly and expensively calculate them at each iteration. For Broyden’s Method, we begin with an initial estimate of the Jacobian and update it at each iteration based on the new position of our guess vector $x$ .

Recall that for the Taylor expansion of our function f is:

$f(x) \approx f(x_k) + \nabla f(x_k)^T (x - x_k)$

where $\nabla$ is the Jacobian (gradient).

When we set $f(x)$ to zero in order to find the intercept of the tangent plane on the x-axis, we get:

$0 = f(x_k) + \nabla f(x_k)^T (x_{k+1} - x_k)$

And solving for $x_{k+1}$ gives:

$x_{k+1} = x_k - [\nabla f(x_k)]^{-1} f(x_k)$

At each iteration, instead of calculating and recalculating the Jacobian, we approximate it with a matrix $J$ . We simply want to update the values of $J$ based on the value of J at the previous iteration and the new value of x we’ve calculated. Broyden’s solution is to minimize the difference between the old and new Jacobian approximation as measured by the Frobenius Norm:

$||J||_{Frobenius} = ( \sum_{i=1}^m \sum_{j=1}^n |J_{ij}|^2 ) ^{1/2}$

i.e. to minimize $||J_k - J_{k-1}||_{Frobenius}$

while also imposing the secant condition:

$J_k (x_k - x_{k-1}) = f(x_k) - f(x_{k-1})$

The minimization of the norm just helps make sure the Jacobian doesn’t change too rapidly, while imposing the secant condition makes sense; revisiting the secant condition, it basically says “the rate of change (Jacobian) of the function multiplied by a change in input had better equal the change in output of the function applied to that same change in input,” or $J(\Delta x) = \Delta f(x)$ .

Using a Lagrange multiplier $\lambda$ and $d = \Delta J \Delta x$ we’re now finding the critical points of the Lagrangian $\Lambda$ :

$\Lambda = ||\Delta J||_{Frobenius} + \lambda ^T (\Delta J \Delta x - \Delta d)$

Differentiating w.r.t $\Delta J$ gives:

$0 = 2 (\Delta J) + \lambda (\Delta x)$

$\Delta J = \frac {-1}{2} \lambda (\Delta x)^T$

Substituting into $d = \Delta J \Delta x$ gives a multiplier of $\lambda = \frac {-2d}{||\Delta x||^2}$ , which gives:

$\Delta J = \frac {d(\Delta x)^T} {||\Delta x||^2}$

This gives us our update rule for the Jacobian:

$J_k = J_{k-1} + \Delta J$

$J_k = J_{k-1} + \frac {d(\Delta x)^T} {||\Delta x||^2}$

$J_k = J_{k-1} + \frac {(f(x_k) - f(x_{k-1}) - J_{k-1} \Delta x)} {||x_k - x_{k-1}||^2} (x_k - x_{k-1})^T$

First we have Broyden’s Method (good), which follows the update rule above. For better legibility, variables are more concise than the update formula given above.

from __future__ import division
import numpy as np
import scipy.linalg as sla

def broyden_good(x, y, f_equations, J_equations, tol=10e-10, maxIters=50):
    steps_taken = 0

    f = f_equations(x,y)
    J = J_equations(x,y)

    while np.linalg.norm(f,2) > tol and steps_taken < maxIters:

        s = sla.solve(J,-1*f)

        x = x + s[0]
        y = y + s[1]
        newf = f_equations(x,y)
        z = newf - f

        J = J + (np.outer ((z - np.dot(J,s)),s)) / (np.dot(s,s))

        f = newf
        steps_taken += 1

    return steps_taken, x, y  

tol = 10.0** -15
maxIters = 50
x0 = 1
y0 = 2

def fs(x,y):
    return np.array([x + 2*y - 2, x**2 + 4*y**2 - 4])

def Js(x,y):
    return np.array([[1,2],
             [2, 16]])

n, x, y = broyden_good(x0, y0, fs, Js, tol, maxIters=50 )
print "iterations: ", n
print "x and y: ", x, y

iterations:  8
x and y:  8.15906707552e-17 1.0

For reference, the correct solution to this system of equations:

$f_1(x) = x + 2y - 2$

$f_2(x) = x^2 + 4y^2 - 4$

is [0,1].

Broyden’s Bad Method

I’m not clear on the reason for calling one good and the other bad; so far as I know Broyden and other early users obtained better numerical accuracy using the above version. However, the “bad” version that follows carries with it an important concept that is used for lots of optimization algorithms.

In computing our updates to x via:

$x_{k+1} = x_k - [\nabla f(x_k)]^{-1} f(x_k)$

We had to invert the matrix J at each iteration. (Remember that matrix inversion is an O(n^3) operation.) The motivation behind Broyden’s bad algorithm is: instead of doing that expensive computation at each step, what if we just store and update the inverse Jacobian at each iteration?

The result is a much faster algorithm with a different update rule obtained from the Sherman-Morrison formula:

$J_k = J_{k-1} + \frac {(d - u) d J_k} {d^T u}$

where $d = x_k - x_{k-1}$ and $u = J_k [f(x_k) - f(x_{k-1})]$ .

from __future__ import division
import numpy as np
import scipy.linalg as sla
from numpy.linalg import inv

def broyden_bad(x, y, f_equations, J_equations, tol=10e-10, maxIters=50):
    steps_taken = 0

    f = f_equations(x,y)

    J = inv(J_equations(x,y))

    while np.linalg.norm(f,2) > tol and steps_taken <span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span><span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>< maxIters:

        s = J.dot(f_equations(x,y))

        x = x - s[0]
        y = y - s[1]

        newf = f_equations(x,y)
        z = newf - f

        u = J.dot(z)
        d = - 1 * s

        J = J  + np.dot(((d-u).dot(d)), J) / np.dot(d,u)

        f = newf
        steps_taken += 1

    return steps_taken, x, y  

tol = 10.0** -15
maxIters = 50
x0 = 1
y0 = 2

def fs(x,y):
    return np.array([x + 2*y - 2, x**2 + 4*y**2 - 4])

def Js(x,y):
    return np.array([[1,2],
             [2, 16]])

B = np.array([[1,2],
             [2, 16]])

n, x, y = broyden_bad(x0, y0, fs, Js, tol, maxIters=50 )
print "iterations: ", n
print "x and y: ", x, y

iterations:  8
x and y:  1.81816875617e-16 1.0

References