Broyden’s Method in Python

In a previous post we looked at root-finding methods for single variable equations. In this post we’ll look at the expansion of Quasi-Newton methods to the multivariable case and look at one of the more widely-used algorithms today: Broyden’s Method.

optimization203

Broyden’s Good Method

Broyeden’s Method is, like the Secant Method and Brent’s Method, another attempt to use information about the derivatives without having to explicitly and expensively calculate them at each iteration. For Broyden’s Method, we begin with an initial estimate of the Jacobian and update it at each iteration based on the new position of our guess vector x.

Recall that for the Taylor expansion of our function f is:

f(x) \approx f(x_k) + \nabla f(x_k)^T (x - x_k)

where \nabla is the Jacobian (gradient).

When we set f(x) to zero in order to find the intercept of the tangent plane on the x-axis, we get:

0 = f(x_k) + \nabla f(x_k)^T (x_{k+1} - x_k)

And solving for x_{k+1} gives:

x_{k+1} = x_k - [\nabla f(x_k)]^{-1} f(x_k)

At each iteration, instead of calculating and recalculating the Jacobian,  we approximate it with a matrix J. We simply want to update the values of J based on the value of J at the previous iteration and the new value of x we’ve calculated. Broyden’s solution is to minimize the difference between the old and new Jacobian approximation as measured by the Frobenius Norm:

||J||_{Frobenius} = ( \sum_{i=1}^m \sum_{j=1}^n |J_{ij}|^2 ) ^{1/2}

i.e. to minimize ||J_k - J_{k-1}||_{Frobenius}

while also imposing the secant condition:

J_k (x_k - x_{k-1}) = f(x_k) - f(x_{k-1})

The minimization of the norm just helps make sure the Jacobian doesn’t change too rapidly, while imposing the secant condition makes sense; revisiting the secant condition, it basically says “the rate of change (Jacobian) of the function multiplied by a change in input had better equal the change in output of the function applied to that same change in input,” or J(\Delta x) = \Delta f(x).

Using a Lagrange multiplier \lambda and d = \Delta J \Delta x we’re now finding the critical points of the Lagrangian \Lambda:

\Lambda = ||\Delta J||_{Frobenius} + \lambda ^T (\Delta J \Delta x - \Delta d)

Differentiating w.r.t \Delta J gives:

0 = 2 (\Delta J) + \lambda (\Delta x)

\Delta J = \frac {-1}{2} \lambda (\Delta x)^T

Substituting into d = \Delta J \Delta x gives a multiplier of \lambda = \frac {-2d}{||\Delta x||^2}, which gives:

\Delta J = \frac {d(\Delta x)^T}  {||\Delta x||^2}

This gives us our update rule for the Jacobian:

J_k = J_{k-1} + \Delta J

J_k = J_{k-1} + \frac {d(\Delta x)^T}  {||\Delta x||^2}

J_k = J_{k-1} + \frac {(f(x_k) - f(x_{k-1}) - J_{k-1} \Delta x)} {||x_k - x_{k-1}||^2} (x_k - x_{k-1})^T

First we have Broyden’s Method (good), which follows the update rule above. For better legibility, variables are more concise than the update formula given above.

from __future__ import division
import numpy as np
import scipy.linalg as sla

def broyden_good(x, y, f_equations, J_equations, tol=10e-10, maxIters=50):
    steps_taken = 0

    f = f_equations(x,y)
    J = J_equations(x,y)

    while np.linalg.norm(f,2) > tol and steps_taken < maxIters:

        s = sla.solve(J,-1*f)

        x = x + s[0]
        y = y + s[1]
        newf = f_equations(x,y)
        z = newf - f

        J = J + (np.outer ((z - np.dot(J,s)),s)) / (np.dot(s,s))

        f = newf
        steps_taken += 1

    return steps_taken, x, y  

tol = 10.0** -15
maxIters = 50
x0 = 1
y0 = 2

def fs(x,y):
    return np.array([x + 2*y - 2, x**2 + 4*y**2 - 4])

def Js(x,y):
    return np.array([[1,2],
             [2, 16]])

n, x, y = broyden_good(x0, y0, fs, Js, tol, maxIters=50 )
print "iterations: ", n
print "x and y: ", x, y
iterations:  8
x and y:  8.15906707552e-17 1.0

For reference, the correct solution to this system of equations:

f_1(x) = x + 2y - 2

f_2(x) = x^2 + 4y^2 - 4

is [0,1].

Broyden’s Bad Method

I’m not clear on the reason for calling one good and the other bad; so far as I know Broyden and other early users obtained better numerical accuracy using the above version. However, the “bad” version that follows carries with it an important concept that is used for lots of optimization algorithms.

In computing our updates to x via:

x_{k+1} = x_k - [\nabla f(x_k)]^{-1} f(x_k)

We had to invert the matrix J at each iteration. (Remember that matrix inversion is an O(n^3) operation.) The motivation behind Broyden’s bad algorithm is: instead of doing that expensive computation at each step, what if we just store and update the inverse Jacobian at each iteration?

The result is a much faster algorithm with a different update rule obtained from the Sherman-Morrison formula:

J_k = J_{k-1} + \frac {(d - u) d J_k} {d^T u}

where d = x_k - x_{k-1} and u = J_k [f(x_k) - f(x_{k-1})].

from __future__ import division
import numpy as np
import scipy.linalg as sla
from numpy.linalg import inv

def broyden_bad(x, y, f_equations, J_equations, tol=10e-10, maxIters=50):
    steps_taken = 0

    f = f_equations(x,y)

    J = inv(J_equations(x,y))

    while np.linalg.norm(f,2) > tol and steps_taken <span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span><span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>< maxIters:

        s = J.dot(f_equations(x,y))

        x = x - s[0]
        y = y - s[1]

        newf = f_equations(x,y)
        z = newf - f

        u = J.dot(z)
        d = - 1 * s

        J = J  + np.dot(((d-u).dot(d)), J) / np.dot(d,u)

        f = newf
        steps_taken += 1

    return steps_taken, x, y  

tol = 10.0** -15
maxIters = 50
x0 = 1
y0 = 2

def fs(x,y):
    return np.array([x + 2*y - 2, x**2 + 4*y**2 - 4])

def Js(x,y):
    return np.array([[1,2],
             [2, 16]])

B = np.array([[1,2],
             [2, 16]])

n, x, y = broyden_bad(x0, y0, fs, Js, tol, maxIters=50 )
print "iterations: ", n
print "x and y: ", x, y
iterations:  8
x and y:  1.81816875617e-16 1.0

 

References

2 thoughts on “Broyden’s Method in Python”

  1. Hi, I am trying to implement a similar algorithm to solve a 3D problem. However, using the broyden1 function in python does not show convergence. So I just want to check whether it is the problem with the function that I defined. Thank you for this valuable example, I am trying to implement this broyden good method manually in my code. But I have a doubt, what is this definition of function Js that you have defined in the code. Would be glad to receive your help.

    Like

    1. Hi Joseph,
      J is an approximation of the Jacobian, and in order to start Broyden’s method we need an initial guess of J, so “Js” is this initial guess in the toy example I provide. Note that if the initial guess of J is too far off then the algorithm might not converge to the correct solution, so a weakness of this method is its dependence on a good initial guess for J. I would recommend using something from a reputable library like scipy.optimize for your program, but if you still want to do it by hand scipy has an implementation of Broyden’s that is built for production vs. mine, which is built for demonstration. Take a look there and see what they do to make the method more robust and ensure the initial guess of J is a good one. Good luck.

      https://github.com/scipy/scipy/blob/master/scipy/optimize/nonlin.py

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s