Newton’s method - Semideﬁnite Optimization

3.10 Exercises

4.1.1 Newton’s method

We start by recalling Newton’s method for unconstrained minimization. New-ton’s method is an iterative method for finding roots of equations in one or more dimensions. It is one of the most important algorithms in numerical analysis and scientific computing. In convex optimization it can be used to find minimizers of convex differentiable functions. The Newton method is also the fundamental algorithm for the design of fast interior point algorithms.

Unconstrained minimization

Newton’s method is quite general. It is natural to define it in the setting of Ba-nach spaces. Chapter XVIII of the book “Functional analysis in normed spaces”

by L.V. Kantorovich and G.P. Akilov is a classical resource for this which also includes the first thorough analysis of the convergence behavior of Newton’s method. Nowadays every comprehensive book on numerical analysis contains a chapter stating explicit conditions for the convergence speed of Newton’s method.

To keep it as simple and concrete as possible we define it here only forRⁿ. LetΩbe an open set ofRⁿand letf : ΩÑRbe a strictly convex, differentiable function. The Taylor approximation of the functionf around the pointais

fpa`xq “ ˆ

fpaq `∇fpaq^Tx`1

2x^T∇²fpaqx

`h.o.t., where∇fpaq PRⁿis thegradientoff atawith entries

∇fpaq “ ˆ B

Bx₁fpaq, . . . , B Bx_nfpaq

˙T

and where∇²fpaq PR^nˆnis theHessian matrixoff atawith entries r∇²fpaqs_ij “ B²

BxiBxj

fpaq,

and where h.o.t. stands for “higher order terms”. Since the function is strictly convex, the Hessian matrix is positive definite,∇²fpaq P S_ą0ⁿ . Byq : Rⁿ ÑR

we denote the quadratic function which we get by truncating the above Taylor approximation

qpxq “fpaq `∇fpaq^Tx`1

2x^T∇²fpaqx.

This is a strictly convex quadratic function and so it has a unique minimizer x^˚PRⁿwhich can be determined by setting the gradient ofqto zero:

0“∇qpx^˚q

“ ˆ B

Bx1

qpx^˚q, . . . , B Bxn

qpx^˚q

˙T

“∇fpaq `∇²fpaqx^˚.

Hence, we find the unique minimizer x^˚ of q by solving a system of linear equations

x^˚“ ´`

∇²fpaq˘´1

∇fpaq.

Now Newton’s method is based on approximating the functionf locally at a starting pointaby the quadratic functionq, finding the minimizer (theNewton direction)x^˚of the quadratic function, updating the starting point toa`x^˚and repeating this until the desired accuracy is reached:

repeat x^˚Ð ´`

∇²fpaq˘´1

∇fpaq aÐa`x^˚

untila stopping criterion is fulfilled.

The following fact about Newton’s method are important.

First the good news: If the starting point is close to the minimizer, then the Newton method converges quadratically (for instance the seriesnÞÑ ₁₀¹2n

converges quadratically to its limit0), i.e. in every step the number of accurate digits is multiplied by a constant number.

However, if the starting point is not close to the minimizer or if the function is close to being not strictly convex, then Newton’s method does not converge well. Consider for example the convex but not strictly convex univariate func-tionfpzq “1{4z⁴´z. Thenf¹pzq “ z³´1 andf²pzq “3z². So if one starts the Newton iteration ata“0, one immediately is in trouble: division by zero.

If one starts ata “ ´a³

1{2, then one can perform a Newton step and one is in trouble again, etc. Figure 4.1.1 shows the fractal structure which is behind Newton’s method for solving the equationf¹pzq “ z³´1 “ 0 in the complex number plane. One has similar figures for other functions.

This pure Newton method is an idealization and sometimes it cannot be performed at all because it can very well happen, thata`x^˚ R Ω. One can circumvent these problems by replacing the Newton step a Ð a`x^˚ by a damped Newton stepa Ð a`θx^˚ with some step size θ ą0 which is chosen

Figure 4.1: Newton fractal ofz³´1 “0. The three colors indicate the region of attraction for the three roots. The shade of the color indicates the number of steps needed to come close to the corresponding root. (Source: wikipedia).

to ensure e.g. thata`θx^˚ P Ω. Choosing the rightθ using a line search can be done in many ways. A popular choice is backtracking line search using the Armijo-Goldstein condition.

Let us discuss stopping criteria a bit: One possible stopping criterion is for example if the the norm of the gradient is small, i.e. for some predefined positive we do the iteration until

}∇fpaq}²ď. (4.1)

We now derive a stopping criterion in the case when the functionf is not only strictly convex but also strongly convex. This means that there is a positive constant m so that the smallest eigenvalue of all Hessian matrices of f is at leastm:

@aPΩ :λminp∇²fpaqq ěm.

By the Lagrange form of the Taylor expansion we have

@a, a`xPΩDξP ra, a`xs:fpa`xq “fpaq `∇fpaq^Tx`1

2x^T∇²fpξqx and the strong convexity off together with the variational characterization of

the the smallest eigenvalue, which says that λ_minp∇²fpξqq “ min

xPRⁿzt0u

x^T∇²fpξqx }x}² , gives

fpa`xq ěfpaq `∇fpaq^Tx`1 2m}x}². Consider the function of the right hand side

xÞÑfpaq `∇fpaq^Tx`1 2m}x}². It is a convex quadratic function with gradient

xÞÑ∇fpaq `mx, hence its minimum is attained at

x^˚“ ´1 m∇fpaq.

So we have for the minimumµ^˚off µ^˚ěfpaq `∇fpaq^Tp´1

m∇fpaqq `1 2m

›

› 1 m∇fpaq

›

“fpaq ´ 1

2m}∇fpaq}²,

which says that whenever the stopping criterion (4.1) is fulfilled we know that fpaqandµ^˚ are at most{p2mqapart. Of course, the drawback of this consid-eration is that one has to know or estimate the constantmin advance which is often not easy. Nevertheless the consideration at least shows that the stopping criterion is sensible.

Equality-constrained minimization

In the next step we show how to modify Newton’s method if we want to find the minimum of a strictly convex, differentiable functionf : ΩÑRin an affine subspace given by the equations

a^T₁x“b₁, a^T₂x“b₂, . . . , a^T_mx“b_m, wherea₁, . . . , a_mPRⁿandb₁, . . . , b_mPR.

We define the Lagrange function

Lpx, λ1, . . . , λmq “fpxq `

i“1

λia^T_ix,

and the method ofLagrange multiplierssays that if a pointy^˚ lies in the affine space

a^T₁y^˚“b1, . . . , a^T_my^˚“bm, then it is the unique minimizer off if and only if

∇Lpy^˚q “0.

To find this pointy^˚ we approximate the function f using the Taylor approxi-mation around the pointaby

qpxq “fpaq `∇fpaq^Tx`1

2x^T∇²fpaqx and solve the linear system (in the variablesx^˚andλ1, . . . , λm)

a^T₁pa`x^˚q “b₁, . . . , a^T_mpa`x^˚q “b_m

∇fpaq `∇²fpaqx^˚`

i“1

λiai “0

to find the Newton directionx^˚. Then we can do the same Newton iterations using damped Newton steps as in the case of unconstrained optimization.

No documento Semideﬁnite Optimization (páginas 69-73)