In a previous post we looked at root-finding methods for single variable equations. In this post we’ll look at the expansion of Quasi-Newton methods to the multivariable case and look at one of the more widely-used algorithms today: Broyden’s Method.
Broyden’s Good Method
Broyeden’s Method is, like the Secant Method and Brent’s Method, another attempt to use information about the derivatives without having to explicitly and expensively calculate them at each iteration. For Broyden’s Method, we begin with an initial estimate of the Jacobian and update it at each iteration based on the new position of our guess vector .
Recall that for the Taylor expansion of our function f is:
where is the Jacobian (gradient).
When we set to zero in order to find the intercept of the tangent plane on the x-axis, we get:
And solving for gives:
At each iteration, instead of calculating and recalculating the Jacobian, we approximate it with a matrix . We simply want to update the values of
based on the value of J at the previous iteration and the new value of x we’ve calculated. Broyden’s solution is to minimize the difference between the old and new Jacobian approximation as measured by the Frobenius Norm:
i.e. to minimize
while also imposing the secant condition:
The minimization of the norm just helps make sure the Jacobian doesn’t change too rapidly, while imposing the secant condition makes sense; revisiting the secant condition, it basically says “the rate of change (Jacobian) of the function multiplied by a change in input had better equal the change in output of the function applied to that same change in input,” or .
Using a Lagrange multiplier and
we’re now finding the critical points of the Lagrangian
:
Differentiating w.r.t gives:
Substituting into gives a multiplier of
, which gives:
This gives us our update rule for the Jacobian:
First we have Broyden’s Method (good), which follows the update rule above. For better legibility, variables are more concise than the update formula given above.
from __future__ import division import numpy as np import scipy.linalg as sla def broyden_good(x, y, f_equations, J_equations, tol=10e-10, maxIters=50): steps_taken = 0 f = f_equations(x,y) J = J_equations(x,y) while np.linalg.norm(f,2) > tol and steps_taken < maxIters: s = sla.solve(J,-1*f) x = x + s[0] y = y + s[1] newf = f_equations(x,y) z = newf - f J = J + (np.outer ((z - np.dot(J,s)),s)) / (np.dot(s,s)) f = newf steps_taken += 1 return steps_taken, x, y tol = 10.0** -15 maxIters = 50 x0 = 1 y0 = 2 def fs(x,y): return np.array([x + 2*y - 2, x**2 + 4*y**2 - 4]) def Js(x,y): return np.array([[1,2], [2, 16]]) n, x, y = broyden_good(x0, y0, fs, Js, tol, maxIters=50 ) print "iterations: ", n print "x and y: ", x, y
iterations: 8 x and y: 8.15906707552e-17 1.0
For reference, the correct solution to this system of equations:
is [0,1].
Broyden’s Bad Method
I’m not clear on the reason for calling one good and the other bad; so far as I know Broyden and other early users obtained better numerical accuracy using the above version. However, the “bad” version that follows carries with it an important concept that is used for lots of optimization algorithms.
In computing our updates to x via:
We had to invert the matrix J at each iteration. (Remember that matrix inversion is an O(n^3) operation.) The motivation behind Broyden’s bad algorithm is: instead of doing that expensive computation at each step, what if we just store and update the inverse Jacobian at each iteration?
The result is a much faster algorithm with a different update rule obtained from the Sherman-Morrison formula:
where and
.
from __future__ import division import numpy as np import scipy.linalg as sla from numpy.linalg import inv def broyden_bad(x, y, f_equations, J_equations, tol=10e-10, maxIters=50): steps_taken = 0 f = f_equations(x,y) J = inv(J_equations(x,y)) while np.linalg.norm(f,2) > tol and steps_taken <span data-mce-type="bookmark" id="mce_SELREST_start" data-mce-style="overflow:hidden;line-height:0" style="overflow:hidden;line-height:0" ></span><span data-mce-type="bookmark" id="mce_SELREST_start" data-mce-style="overflow:hidden;line-height:0" style="overflow:hidden;line-height:0" ></span>< maxIters: s = J.dot(f_equations(x,y)) x = x - s[0] y = y - s[1] newf = f_equations(x,y) z = newf - f u = J.dot(z) d = - 1 * s J = J + np.dot(((d-u).dot(d)), J) / np.dot(d,u) f = newf steps_taken += 1 return steps_taken, x, y tol = 10.0** -15 maxIters = 50 x0 = 1 y0 = 2 def fs(x,y): return np.array([x + 2*y - 2, x**2 + 4*y**2 - 4]) def Js(x,y): return np.array([[1,2], [2, 16]]) B = np.array([[1,2], [2, 16]]) n, x, y = broyden_bad(x0, y0, fs, Js, tol, maxIters=50 ) print "iterations: ", n print "x and y: ", x, y
iterations: 8 x and y: 1.81816875617e-16 1.0
References
Hi, I am trying to implement a similar algorithm to solve a 3D problem. However, using the broyden1 function in python does not show convergence. So I just want to check whether it is the problem with the function that I defined. Thank you for this valuable example, I am trying to implement this broyden good method manually in my code. But I have a doubt, what is this definition of function Js that you have defined in the code. Would be glad to receive your help.
LikeLike
Hi Joseph,
J is an approximation of the Jacobian, and in order to start Broyden’s method we need an initial guess of J, so “Js” is this initial guess in the toy example I provide. Note that if the initial guess of J is too far off then the algorithm might not converge to the correct solution, so a weakness of this method is its dependence on a good initial guess for J. I would recommend using something from a reputable library like scipy.optimize for your program, but if you still want to do it by hand scipy has an implementation of Broyden’s that is built for production vs. mine, which is built for demonstration. Take a look there and see what they do to make the method more robust and ensure the initial guess of J is a good one. Good luck.
https://github.com/scipy/scipy/blob/master/scipy/optimize/nonlin.py
LikeLike