Pesqui. Oper. vol.34 número3

(1)

doi: 10.1590/0101-7438.2014.034.03.0585

A SURVEY ON MULTIOBJECTIVE DESCENT METHODS

Ellen H. Fukuda

1

_{and Luis Mauricio Gra˜na Drummond}

2*

Received September 28, 2013 / Accepted January 2, 2014

ABSTRACT.We present a rigorous and comprehensive survey on extensions to the multicriteria setting of three well-known scalar optimization algorithms. Multiobjective versions of the steepest descent, the projected gradient and the Newton methods are analyzed in detail. At each iteration, the search directions of these methods are computed by solving real-valued optimization problems and, in order to guarantee an adequate objective value decrease, Armijo-like rules are implemented by means of a backtracking pro-cedure. Under standard assumptions, convergence to Pareto (weak Pareto) optima is established. For the Newton method, superlinear convergence is proved and, assuming Lipschitz continuity of the objectives second derivatives, it is shown that the rate is quadratic.

Keywords: multiobjective optimization, Newton method, nonlinear optimization, projected gradient method, steepest descent method.

1 INTRODUCTION

In multiobjective (or multicriteria) optimization, finitely many objective functions have to be minimized simultaneously. Hardly ever a single point will minimize all of them at once, so we need another notion of optimality. Here, we will use the concepts ofParetoandweak Pareto optimality. A point is called Pareto optimal or efficient, if there does not exist a different point with the same or smaller objective function values, such that there is a strict decrease in at least one objective function value. A point is called weakly Pareto optimal or weakly efficient if there does not exist a different point with strict decrease in all its objective values. Applications for this type of problem can be found in engineering [14] (especially truss optimization [12, 32], design [20, 31], space exploration [39, 41]), statistics [10], management science [15, 40, 42], environmental analysis [34, 16], etc.

One of the main solution strategies for multiobjective optimization problems is the scalarization approach [23, 24, 43]: we obtain the optimal solutions of the vector-valued problem by solving

*Corresponding author.

1Graduate School of Informatics, Kyoto University, 606-8501, Kyoto, Japan. E-mail: ellen@i.kyoto-u.ac.jp

(2)

one or several parametrized single-objective (i.e., scalar-valued) optimization problems. Among the scalarization techniques, we have the so-called weighting method, where one minimizes a nonnegative linear combination of the objective functions [26, 29, 30, 37]. The parameters are not known in advance and the modeler or decision-maker has to choose them. For some prob-lems, this choice can be problematic, as shown in the example provided in [18, Section 7], where almost all choices of the parameters lead to unbounded scalar problems. Only recently, adap-tive scalarization techniques, where scalarization parameters are chosen automatically during the course of the algorithm such that a certain quality of approximation is maintained, have been proposed [13, 17]. Still other techniques, working only in the bicriteria case, can be viewed as choosing a fixed grid in a particular parameter space [11, 12].

Multicriteria optimization algorithms that do not scalarize have recently been developed (see, e.g., [5, 6] for an overview on the subject). Some of these techniques are extensions of scalar optimization algorithms (notably the steepest descent algorithm [19] with at most linear conver-gence), while others borrow heavily from ideas developed in heuristic optimization [33, 38]. For the latter, no convergence proofs are known, and empirical results show that convergence gen-erally is, as in the scalar case, quite slow [44]. Other parameter-free multicriteria optimization techniques use an ordering of the different criteria, i.e., an ordering of importance of the compo-nents of the objective function vector. In this case, the ordering has to be prespecified. Moreover, the corresponding optimization process is usually augmented by an interactive procedure [36].

Following the research line opened in 2001 with [19], other classical (scalar) optimization pro-cedures where extended in recent years, not just for the multicriteria setting, but also for vector optimization problems, i.e., when other underlying ordering cones are used instead of the non-negative orthant. In [28], a steepest descent method is proposed, while in [1, 21, 22, 25], several versions of the projected gradient method are studied. Proximal point type methods are analyzed in [4]. Trust region strategies for multicriteria were developed in [8, 9]. Even a steepest descent method for multicriteria optimization on Riemannian manifolds was developed in [2]. In 2009, Fliege et al. [18] came up with a multiobjective version of the Newton method, while later on, Gra˜na Drummond et al. [27] proposed a Newton method for vector optimization.

These extensions share an essential feature: they are all descent methods, i.e., the vector objec-tive value decreases at each iteration in the partial order induced by the underlying cones. Under reasonable assumptions, full convergence of sequences produced by these algorithms is estab-lished in all those works. As expected, for the vector-valued Newton methods, convergence is local andfast, while for the others it is global (and not so fast).

(3)

The outline of this paper is as follows. In Section 2 we establish the notation, as well as the basic concepts, and define the (smooth) unconstrained multiobjective problem. We also state and prove a simple but important result on descent directions. In Section 3, we study the multiobjective steepest descent method. First, we introduce the search direction and analyze its properties. Then we describe the algorithm and, finally, we perform the convergence analysis of the method. Basically, we establish that, under convexity of the objectives and another natural assumption, any sequence produced by the method is globally convergent to a weak Pareto optimal point. In Section 4 we define the smooth constrained multicriteria problem, as well as the multiobjective projected gradient algorithm and we present its convergence analysis, which is quite similar to the previous one. In Section 5, for twice continuously differentiable strongly convex objectives, we define the multiobjective Newton method and, assuming similar conditions as those of the previous cases, we establish its convergence to Pareto optimal points with superlinear rate. Under the additional hypothesis of Lipschitz continuity of the objectives’ second derivatives, we prove quadratic convergence. In Section 6 we make some comments on how to obtain (strong) Pareto optima with the first two algorithms and on how to approach the whole efficient frontier with the three methods.

2 PRELIMINARIES

All over this work,·,_·will stand for the usual canonical inner product inRp (here, we will just deal withp ₌nor p ₌m), i.e.,u, v₌ p

i=1uivi =u⊤v. Likewise, · will always be the Euclidean norm, i.e.,u₌ √u,u. For an m_×n real matrix A,A will denote its operator (Euclidean) norm. As usual,R+andR++designate the sets of nonnegative and positive

real numbers, respectively.

LetRm₊:=R+× · · ·×R+andRm++:=R++× · · ·×R++be the nonnegative orthant orParetian

cone and the positive orthant ofRm, respectively. Consider the partial order induced byRm₊: for u, v_∈Rm,u_≤v(alternatively,v_≥u) ifv₋u _∈Rm₊, as well as the following stronger relation: u < v (alternatively,v >u) ifv₋u _∈R₊₊m . In other words,u _≤vstands forui ≤ vi for all i₌1, . . . ,m, andu< vshould also be understood as a (strict) componentwise inequality.

Let

F_: Rn→Rm

be a continuously differentiable mapping. Our problem consists of finding Pareto or weak Pareto points forF inRnand we note it by

minimize F(x)

subject to x_∈Rn_. (1)

Recall thatx∗_∈Rnis aPareto optimal(orefficient) point forF, if

(4)

that is to say, there does not exist x _∈ Rn such thatFi(x) ≤ Fi(x∗)for alli = 1, . . . ,mand Fi₀(x) < Fi₀(x∗)for at least onei0. A pointx∗ _∈ Rn is aweakly Pareto optimal(orweakly efficient) point forF, if

there is nox_∈Rnsuch that F(x) < F(x∗),

i.e., there does not exist x _∈ Rn _{such that} _Fi₍_x_{) <} _Fi₍_x∗₎_{for all}_i ₌ ₁_{, . . . ,}_m_{. Clearly, if}

x∗_∈Rnis a Pareto optimal point, thenx∗is a weak Pareto point, and the converse is not always true.

As wee will see in our first lemma, a necessary condition for optimality of a point x¯ _∈ C is stationarity(orcriticality):

J F(x_¯)(Rn)_{∩ [−}Rm₊₊] = ∅,

whereJ F(x_¯)stands for the Jacobian matrix ofFatx¯(them_×nmatrix with entries(J F(x_¯))i,j =

∂Fi(x_¯)/∂xj),J F(x¯)(Rn):=J F(x¯)v:v ∈Rnand−Rm₊₊:= {−u:u ∈Rm₊₊}. Sox¯∈Rnis stationaryforFif, and only if, for allv_∈Rnwe have J F(x_¯)v<0, that is to say for allvthere exists j₌ j(v)such that

(J F(x_¯)v)j = ∇Fj(x¯)⊤v≥0, (2)

and this condition implies

max

i=1,...,m∇Fi(x¯)

⊤_v_≥₀ _{for all}_v_∈_Rn_.

Note that form ₌ 1, we retrieve the classical stationarity condition for unconstrained scalar-valued optimization:∇F(x_¯)₌0.

By definition, a point x _∈ Rn is nonstationary if there exists a directionv _∈ Rn such that J F(x)v <0. The next result shows not only that suchvis aRm₊-descent directionforF atx (there existsε >0 such that F(x₊tv) <F(x)for allt _∈(0, ε_]), but it also estimates theRm₊ -function decrease. As we will see later on, this result allows us to implement, by a backtracking procedure, an Armijo-like rule on all the multiobjective (descent) methods we will study here.

Lemma 2.1. Let F_:Rn → Rm be continuously differentiable,v _∈ Rn such that J F(x)v < 0 andσ _∈(0,1). Then there exists someε >0(which may depend on x,vandσ) such that

F(x₊tv) <F(x)₊σt J F(x)v for any t_∈(0, ε_]. (3)

In particular,vis a descent direction for F at x.

Proof. SinceFis differentiable,∇Fi(x)⊤v <0 andσ _∈(0,1), we have lim

t→0

Fi(x₊tv)₋Fi(x)

t = ∇Fi(x)

⊤_{v < σ}_∇_F

i(x)⊤v for alli=1, . . . ,m.

So there existsε >0 such that

(5)

3 THE MULTIOBJECTIVE STEEPEST DESCENT METHOD

In this section we propose an extension of the classical steepest descent method. First we state and prove some results that will allow us to define the algorithm. Then we present its convergence analysis and, finally, we briefly expose a numerically robust version of the method which admits relative errors on computing the search directions at each iteration.

3.1 The search direction for the multiobjective steepest descent method For a given pointx_∈Rn, define the functionϕx:Rn→Rby

ϕx(v):= max

i₌1,...,m∇Fi(x)

⊤_v. ₍₄₎

As being the maximum of linear functions,ϕxis convex and positively homogeneous (ϕx(λv)=

λϕx(v)for allλ >0 and allv∈Rn). Moreover, sinceu →maxi=1,...,muiis Lipschitz continu-ous with constant 1, we also have

ϕx(v+w) ≤ ϕx(v)+ϕx(w),

|ϕx(v)−ϕy(w)| ≤ J F(x)v−J F(y)w

(5)

for allv, w_∈Rn. Consider now the unconstrained scalar-valued minimization problem

min

v ϕx(v)+

1 2v

2_.

(6)

Since the objective function is proper, closed and strongly convex, this problem has always a (unique) optimal solutionv(x), which we callsteepest descent direction, and is therefore given by

v(x)_:=argmin

v_∈Rn

ϕx(v)+ 1 2v

2_. ₍₇₎

Let us callθ (x)the optimal value of (6), that is to say,

θ (x)_:=ϕx(v(x))+ 1 2v(x)

2_.

(8)

Ifm ₌1, we fall back to scalar minimization andϕx(v) = ∇F(x)⊤v, sov(x)= −∇F(x)and we retrieve the classical steepest descent direction atx_∈Rn.

Note that, in order to get rid of the possible nondifferentiability of (6), we can reformulate the problem as

minimize τ

subject to _∇Fi(x)⊤v+ 1 2v

2

−τ _≤ 0, i₌1, . . . ,m,

a convex problem with variables(τ, v)_∈R×Rnfor which there are efficient numerical solvers.

(6)

Proposition 3.1.Letv_: Rn→Rnandθ_:Rn→Rgiven by(7)and(8), respectively. Then, the following statements hold.

1. θ (x)_≤0for all x_∈Rn.

2. The functionθ (_·)is continuous.

3. The following conditions are equivalent:

(a) The point x is nonstationary.

(b) θ (x) <0.

(c) v(x)₌0.

In particular, x is stationary if and only ifθ (x)₌0.

Proof. 1. From (7) and (8), for anyx_∈Rn, we haveθ (x)_≤ϕx(0)+1₂02=0.

2. Let us proof the continuity ofθ in an arbitrary but fixedxo _∈ Rn_{. First let us show that the} functionx _→ v(x)is bounded on compact subsets ofRn. LetC _⊂Rnbe compact. Note that for allx_∈Cand alli ₌1, . . . ,m, we have

−∇Fi(x)v(x) + 1 2v(x)

2

≤ ∇Fi(x)⊤v(x)+ 1 2v(x)

2

≤ ϕx(v(x))+ 1 2v(x)

2

= θ (x) ≤ 0,

where we used the Cauchy-Schwarz inequality, the definitions ofϕx andθ, as well as item 1. Therefore, since F is continuously differentiable, we conclude that there existsκ ₌κ(C) > 0 such thatv(x)_≤κ for allx_∈C. In particular, there existsκ >¯ 0 such that

v(x)_{≤ ¯}κ for allx_{∈ ¯}B(xo,1), (9)

whereB¯(xo,1)is the closed ball with center atxoand radius 1.

Now, forx_{∈ ¯}B(xo_,₁₎_and_i _{∈ {}₁_{, . . . ,}_m_}_{, define}

φx,i:Rn→R

by

z_{→ ∇}Fi(z)⊤v(x)+ 1 2v(x)

2_.

(7)

constant 1,{x}x∈ ¯B(xo_,₁₎is also an equicontinuous family of functions. Then for anyε >0 and

allx_{∈ ¯}B(xo,1), there exists 0< δ <1 such that

z_{− ˜}z< δ _⇒ _|x(z)−x(z˜)|< ε.

Hence forz₋xo_{< δ}_{, from the definitions of functions}_{θ , v,}

xoand the fact that_xo₍xo₎=

θ (xo), we have

θ (z) _≤ max

i=1,...,m∇Fi(z)

⊤_v(_xo₎

+1

2v(x o₎

2

= xo(z)

≤ xo(xo)+ |_xo(z)−_xo(xo)|

< θ (xo)₊ε,

soθ (z)₋θ (xo) < ε. Interchanging the roles ofzandxo, we conclude that, forz₋xo< δ,

|θ (z)₋θ (xo)_|< ε.

3. (a)_⇒(b). Suppose that x is nonstationary. Then, there existsv _∈ Rn such thatϕx(v) = maxi∇F(x)⊤v < 0. Therefore, from the optimality ofv(x)and the fact thatϕx is positively homogeneous,

θ (x)_≤ϕx(τ v)+ 1 2τ v

2

=τϕx(v)+ 1 2τv

2

for allτ >0. Taking 0< τ <₋2ϕx(v)/v2, we see thatθ (x) <0.

(b)⇒(c). Since, by item 1,θ (x)_≤0, it is enough to notice thatv(x)₌0 impliesθ (x)₌0.

(c)⇒(a). Ifv(x) ₌ 0, then maxi∇Fi(x)⊤v(x) = ϕx(v(x)) < θ (x) ≤ 0, so J F(x)v(x) ∈

J F(x)(Rn)_{∩ [−}Rm₊₊]andxis nonstationary.

Note that for a nonstationary pointx, using the above characterization, we haveθ (x) <0 and so, from (8), we getϕx(v(x)) <−1₂v(x)2 <0, which in turn implies thatJ F(x)v(x) < 0. Whence, by Lemma 2.1, there existsε >0 such that

F(x₊tv(x)) <F(x)₊βt J F(x)v(x) for allt _∈(0, ε_],

and, in particular,v(x)is a descent direction, i.e.,F(x₊tv(x)) <F(x)_∀t _∈(0, ε_].

Finally, observe that, while proving Proposition 3.1, we saw thatv_: Rn→Rn, defined by (7), is bounded on compacts sets ofRn. Actually, it can be seen something stronger, namely thatv(_·)is a continuous function [28, Lemma 3.3].

3.2 The multiobjective steepest descent algorithm

We are now in conditions to define the extension of the classical steepest descent method for the unconstrained multiobjective optimization problem (1).

(8)

1. Chooseσ _∈(0,1)andx0_∈Rn. Setk_:=0.

2. Computevk _:=v(xk)₌argmin

v_∈Rn

ϕ_xk(v)+

1 2v

2 ,

whereϕ_xk(v):= max

i=1,...,m∇Fi(x k₎⊤_v_.

3. Computeθ (xk)₌ϕ_xk(vk)+

1 2v

k2_{. If}_{θ (}_xk₎₌_{0, then stop.}

4. Choosetkas the largestt ∈ {1/2j: j =0,1,2, . . .}such that

Fxk₊tvk_≤F(xk)₊σt J F(xk)vk.

5. Setxk+1_:=xk₊tkvk, k:=k+1 and go to Step 2.

Some comments are in order. Sincev _→ ϕ_xk(v)+₂1v2is strongly convex,vk exists and is

unique. Note that, by Proposition 3.1, the stopping ruleθ (xk)₌0 can be replaced byvk ₌0. Also note that, if at iterationkthe algorithm does not reach Step 4, i.e., if it stops at Step 3, then, by Proposition 3.1,xkis a stationary point forF.

If Step 4 is reached at iterationk, by our comments after Proposition 3.1 and Lemma 2.1, the computation of the step length in Step 4 ends up in a finite number of half reductions. Observe that instead of the factor 1/2 in the reductions of Step 4 we could have used any other scalar ν _∈(0,1). If the algorithm does not stop, from Step 4 and the fact thatJ F(xk)vk <0, we have that the objective values sequence{F(xk)_}isRm₊-decreasing, i.e.,

F(xk+1) < F(xk) for all k.

If the algorithm stops at iterationk0, the above inequality holds fork=0,1, . . . ,k0−1.

3.3 Convergence analysis of theMSDM: the general case

Since Algorithm 3.2 ends up with a stationary point or produces infinitely many iterates, from now on, we assume that an infinite sequence{xk_}of nonstationary points is generated. We be-gin presenting a simple application of the standard convergence argument for the scalar-valued steepest descent method.

Theorem 3.3. Let_{xk_}be a sequence produced by Algorithm 3.2. Every accumulation point of

{xk_}, if any, is a stationary point.

Proof. Letx¯be an accumulation point of the sequence{xk_}. Considerv(x_¯)andθ (x_¯), given by (7) and (8), respectively, i.e., the solution and the optimal value of (6) atx _{= ¯}x. According to Proposition 3.1, it is enough to prove thatθ (x_¯)₌0.

We know that the sequence{F(xk)_}isRm₊-decreasing (i.e., componentwise strictly decreasing), so we have that

lim k→∞F(x

(9)

Therefore,

lim k_→∞F(x

k₎

−F(xk+1)₌0.

But, by Steps 2-5 of Algorithm 3.2,

F(xk)₋F(xk+1)_{≥ −}tkβJ F(xk)vk _≥0,

and therefore

lim

k→∞tkJ F(x

k_)vk

=0,

which, componentwise, can be written as

lim

k_→∞tk∇Fi(x k₎⊤_vk

=0 for all i₌1, . . . ,m. (10)

Considering thattk ∈(0,1]for allk, we have the following two possibilities:

lim sup k→∞

tk >0 (11)

or

lim sup k→∞

tk ₌0. (12)

First assume that (11) holds. Then there exists a subsequence{xkj_}

j converging tox¯andt¯>0 such that limj→∞tkj = ¯t. So, using (10), we get

0= lim j→∞∇Fi(x

kj₎⊤_vkj _≤ _lim

j→∞ϕxk j(v

kj₎₊1

2v

kj2₌ _lim

j→∞θ (x

kj₎₌_{θ (}_x_¯_),

where the last equality is a consequence of Proposition 3.1, 2. So, by Proposition 3.1, 3, we conclude thatθ (x_¯)₌0, andx_¯is stationary.

Now assume that (12) holds. Due to the the fact thatv(_·)is bounded on compacts,vk ₌v(xk)for allkand_{xk_}has a convergent subsequence, it follows that the sequence_{vk_}has also a bounded subsequence. Therefore there existsv¯_∈Rnand a subsequence{vkj_}

jsuch that limj→∞vkj = ¯v

and limj→∞tkj =0. Note that we have

max i ∇Fi(x

kj₎⊤_vkj _≤_{θ (v}kj_{) <}₀ _{for all} _j_,

so lettingj _{→ ∞}, we get

max i ∇Fi(x¯)

⊤_v_¯_≤_{θ (}_x_¯₎_≤₀_. ₍₁₃₎

Take now a fixed but arbitrary positive integerq. Sincetkj →0, forj large enough, we have

tkj <

1 2q,

which means that the Armijo-like condition atxkj _{in Step 4 of Algorithm 3.2 is not satisfied for}

t₌1/2q, i.e.,

(10)

So for all j there existsi ₌i(kj)∈ {1, . . . ,m}such that

Fixkj ₊₍₁_/₂q_)vkj_≥_Fi₍_xkj₎₊_{σ (}₁_/₂q₎_∇_Fi₍_xkj₎⊤_v_¯kj_.

Since {i(kj)}j ⊂ {1, . . . ,m}, there exists a subsequence {kjℓ}ℓ and an index i0 such that

i0=i(kjℓ)for allℓ=1,2, . . . and

Fi0

xkj_ℓ ₊₍₁_/₂q_)vkj_ℓ

≥Fi0(x

kj_ℓ₎₊_{σ (}₁_/₂q₎_∇_F

i0(x

kj_ℓ₎⊤_v_¯kj_ℓ_.

Taking the limitℓ_{→ ∞}in the above inequality, we obtain

Fi0

¯

x₊(1/2q)v_¯

≥ Fi0(x¯)+σ (1/2 q₎

∇Fi0(x¯)⊤v.¯

Since this inequality holds for any positive integerqand fori0(depending onq), by Lemma 2.1 it follows thatJ F(x_¯)v_¯ <0, so

max i ∇Fi(x¯)

⊤_v_¯ _≥₀_,

which, together with (13), impliesθ (x_¯) ₌ 0. Therefore, we conclude that x_¯ is a stationary

point.

Assume now thatx0 _{∈ {}x _∈ Rn: F(x)_≤ F(x0)_}, a bounded level set. Since{F(xk)_}isRm₊ -decreasing, the whole sequence{xk_}is contained in the above set, so it is bounded and it has at least one accumulation point, which, by the previous theorem, is stationary forF.

3.4 Full convergence of theMSDM: theRm+-convex case

In order to characterizev(x), the optimal solution of (6), we begin this section by recalling a well-known result on nonsmooth unconstrained optimization, which establishes optimality con-ditions for max-type real-valued objective functions.

Proposition 3.4. Let hi_: Rn → Rbe a differentiable function for i ₌1, . . . ,m. Ifx_ˆ _∈ Rn is an unconstrained minimizer ofmaxi=1,...,mhi(x), then there existsα=α(xˆ)∈Rm, withαi ≥0 for all i andm

i=1αi =1, such that, forτˆ :=maxihi(xˆ), it holds m

i=1

αi∇hi(xˆ)=0, hi(xˆ)− ˆτ ≤0 and αihi(xˆ)− ˆτ=0, i =1, . . . ,m.

If hi is convex for all i, the above conditions are also sufficient for the optimality ofx .ˆ

Proof. First note that the problem minx∈Rnmax_i₌₁_,...,_mh_i(x)can be reformulated as

minimize τ

subject to hi(x)−τ ≤ 0, i=1, . . . ,m,

(14)

with optimal solution (τ ,_ˆ x_ˆ) _∈ R×Rn. Since the Lagrangian of this problem is given by L((τ,x), α)_:=τ₊m

i=1αi

(11)

just those stated in this proposition. Note now that (14) has a Slater point (e. g.,(τ ,_˜ 0)_∈R×Rn, whereτ >˜ hi(0)for alli). So there exists a vector of KKT multipliersα=α(xˆ)such that those conditions are satisfied.

For the converse, just note that as (14) is a convex problem on the variables(τ,x), the KKT conditions are sufficient for the optimality of(τ ,_ˆ x_ˆ).

By (6)-(7),v(x)₌argminvmaxi∇Fi(x)⊤v+(1/2)v2. Therefore, in view of the above propo-sition, there existsω(x)_:=α(v(x))_∈Rm₊, withm

i₌1ωi(x)=1 such that

m

i=1

ωi(x)∇Fi(x)+v(x)=0,

and so

v(x)_{= −}

m

i₌1

ωi(x)∇Fi(x). (15)

The above equality can be rewritten as

v(x)_{= −}J F(x)⊤w(x). (16)

Note that (15) tells us that the steepest descent directionv(x)is a classical steepest descent direction for the scalar-valued problem minxmi₌1ωiFi(x), whose weightsωi :=ωi(x)≥0 are thea prioriunknown KKT multipliers for problem (14), withhi =Fifor alli, at(θ (x), v(x)).

Let us now introduce an important concept. We say that the mappingF_:Rn→RmisRm₊-convex if

Fλx₊(1−λ)z_≤λF(x)₊(1−λ)F(z) for all x,z_∈Rn and allλ_{∈ [}0,1]. (17)

Clearly,F_:Rn→ RmisRm₊-convex if and only if its components Fi: Rn→Rare all convex. Under this additional assumption, we have the following extension of the well-known scalar inequality which establishes that a convex differentiable function always overestimates its linear approximation:

F(u)_≥ F(w)₊J F(w)(u₋w) for any u, w_∈Rn. (18)

In fact, this vector inequality is equivalent to the followingmscalar inequalitiesFi(u)_≥ Fi(w)₊ ∇Fi(w)⊤(u−w)for allu, w∈Rnand alli.

Let us now state and prove a simple result relating Pareto optimality and stationarity.

Lemma 3.5.Letx be a weak Pareto optimum for F. Then_ˆ x is a stationary point. If, in addition,_ˆ F is Rm₊-convex, this condition is also sufficient for weak Pareto optimality. So, under Rm₊ -convexity, the concepts of weak efficiency and stationarity are equivalent.

(12)

Now suppose thatxîs stationary for theRm₊-convex mappingFand take anyx_∈Rn. Sincexîs stationary, (2) holds forx _{= ˆ}x,v _:=x_{− ˆ}xand some j ₌ j(x_ˆ). Therefore, by (18) foru ₌ x andw_{= ˆ}x,Fj(x)_≥Fj(xˆ), and soxîs weak a Pareto optimum forF.

We will keep on studying the case in which the algorithm does not have finite termination and therefore it produces infinite sequences{xk_},{vk_}and{tk}. Let us now state the additional as-sumption under which, in theRm₊-convex case, we will prove full convergence of{xk_}to a sta-tionary point or, in view of the above lemma, to a weak unconstrained optimum ofF.

Assumption 3.6. Every Rm₊-decreasing sequence _{yk_{} ⊆} F(Rn) _{:= {}F(x)_: x _∈ Rn}isRm₊ -bounded from below by a point in F(Rn), i.e., for any_{yk_}contained in F(Rn)with yk+1< yk for all k, there existsx_ˆ_∈Rnsuch that F(x_ˆ)_≤ ykfor all k.

Some comments concerning the generality/restrictiveness of this assumption are in order. In the classical unconstrained (convex) optimization case, this condition is equivalent to existence of solutions of the optimization problem. This assumption, known asRm₊-completeness, is standard for ensuring existence of efficient points for vector optimization problems [35, Section 3].

We will need the following technical result in order to prove that theMSDMis convergent.

Lemma 3.7. Suppose that F isRm₊-convex and letx_ˆ_∈Rnbe such that F(x_ˆ)_≤ F(xk)for some k. Then, we have

ˆx₋xk+12_{≤ ˆ}x₋xk2₊xk₋xk+12.

Proof. By (16), there existswk_:=w(xk)_∈Rm₊such that

vk_{= −}J F(xk)⊤wk. (19)

Using theRm₊-convexity of F, we have that (18) holds, so F(xk)₊J F(xk)(x_ˆ ₋xk) _≤ F(x_ˆ). SinceF(x_ˆ)_≤F(xk), we get

J F(xk)(x_ˆ₋xk)_≤0.

Taking into account thatwk_∈Rm₊and using the above vector inequality as well as (19), we get

−(vk)⊤(x_ˆ₋xk)₌(wk)⊤J F(xk)(x_ˆ₋xk)_≤0.

Recall thatxk+1₌xk₊tkvk, withtk>0. Therefore, from the above inequality we get

(xk₋xk+1)⊤(x_ˆ₋xk)_≤0,

which implies the desired inequality, because

ˆx₋xk+12_{= ˆ}x₋xk2₊xk₋xk+12₊2(xk₋xk+1)⊤(x_ˆ₋xk).

(13)

Lemma 3.8. Let _{x(k)_} be any infinite sequence of nonstationary points in Rn such that F(x(k+1))_≤F(x(k))for all k.

1. Ifx is an accumulation point of_¯ _{x(k)_}, then

F(x_¯)_≤F(x(k)) for all k (20)

and

lim k_→∞F(x

(k)₎

=F(x_¯).

Also, F is constant in the set of accumulation points of_{x(k)_}.

2. If_{xk_}, generated by Algorithm 3.2, has an accumulation point, then all results of item 1 hold for this sequence.

Proof. 1. Let_{x(kj)_}

j be a subsequence that converges tox¯. Take a fixed but arbitraryk. For j large enough, we havekj >kandF(x(kj))≤F(x(k)).So letting j → ∞, we get

F(x_¯)_≤F(x(k)).

Since the above inequality holds componentwise andFi(x(kj)₎_→ _Fi₍_x_¯₎_as _j_{→ ∞}_{for all}_i_{, we}

have limk_→∞F(x(k))=F(x¯).

Now letx_ˆbe another accumulation point of_{x(k)_}. Then there exists a subsequence_{x(kℓ)_}

ℓthat

converges tox_ˆ. By (20),F(x_¯) _≤ F(x(kℓ)₎_{for all}_ℓ_{, so letting}_ℓ _{→ ∞}_{, we get}_F₍_x_¯₎_≤ _F₍_x_ˆ₎_. Interchanging the roles ofx¯ andxˆ, by the same reasoning, we get F(x_ˆ) _≤ F(x_¯). These two Rm₊-inequalities imply thatF(x_ˆ)₌F(x_¯).

2. It suffices to note that any (infinite) sequence {xk_} generated by Algorithm 3.2 satisfies

F(xk+1) < F(xk)for allk.

We now study the speed of convergence to zero of the sequences_{vk_}and_{θ (xk)_}.

Lemma 3.9.Suppose that_{F(xk)_}isRm₊-bounded from below byy. Then_¯

∞

k₌0

tk|θ (xk)|<∞ and

∞

k₌0

tkvk2<∞.

Proof. We know thatFi(xk+1)≤ Fi(xk)+σtk∇Fi(xk)⊤vkfor anyiand allk. Adding up from k₌0 tok₌N, we get

Fi(xN+1) _≤ Fi(x0)₊σ

N

k=0

tk_∇Fi(xk)⊤vk

≤ Fi(x0)₊σ

N

k₌0

tkϕ_xk(vk)

= Fi(x0)+σ N

k=0 tk

θ (xk)₋1 2v

k

2

(14)

where we used (4) and (8) forx₌xk. Asy¯i ≤ Fi(xk)andθ (xk) <0 for alli,k(Proposition 3.1, 3), we obtain

N

k₌0 tk

|θ (xk)_{| +}1 2v

k

2

≤ 1 σ

Fi(x0)− ¯yi.

Since the last inequality holds for any nonnegative integer N, the conclusions follow

immedi-ately.

Before stating and proving the last theorem of this section, let us introduce the main tool for the convergence analysis. Recall that a sequence{uk_{} ⊂} Rnisquasi-Fej´er convergentto a set U _⊂Rnif for everyu _∈U there exists a sequence_{εk} ⊂R,εk ≥0 for allkand such that

uk+1₋u2_≤uk₋u2₊εk for allk=0,1, . . . , with

∞

k=0

εk <∞.

We will need the following well-known result concerning quasi-Fej´er convergent sequences, whose proof can be found in [7].

Theorem 3.10.If a sequence_{uk_}is quasi-Fej´er convergent to a nonempty set U _⊂Rn, then_{uk_} is bounded. If, furthermore,_{uk_}has a cluster point u which belongs to U , thenlimk_→∞uk=u.

In [7] it is proved that the steepest descent method for smooth (scalar) convex minimization, with step size obtained using the Armijo rule implemented by a backtracking procedure, is globally convergent to a solution (essentially, under the sole assumption of existence of optima). Using the same techniques, we extend this result to the multiobjective setting, that is to say, we show that any sequence produced by theMSDMconverges to an optimum of problem (1), no matter how poor the initial guess might be.

Theorem 3.11.Suppose that F isRm₊-convex and that Assumption 3.6 holds. Then any sequence

{xk_}produced by Algorithm 3.2 converges to a weak Pareto optimal point x∗_∈Rn.

Proof. As_{F(xk)_}is anRm₊-decreasing sequence, by Assumption 3.6, there existsx_ˆ _∈Rnsuch that

F(x_ˆ)_≤F(xk) for allk₌0,1, . . . (21)

Now observe that 0<tk_≤1 for allk, so

xk+1₋xk2_≤ 1 tk

xk+1₋xk2₌tkvk2 for allk₌0,1, . . .

Therefore, from (21), Lemma 3.9 and the above inequality, it follows that

∞

k=0

xk₋xk+12<_∞. (22)

Define

(15)

Using theRm₊-convexity ofFand (21), from Lemma 3.7, we see that for anyx_∈L

x₋xk+12_≤x₋xk2₊xk₋xk+12 for allk₌0,1, . . .

AsL is nonempty, becausex_ˆ _∈ L, from (22) and the above inequality, it follows that_{xk_}is quasi-Fej´er convergent to the setL. Therefore,{xk_}has accumulation points. Let x∗be one of them. By Lemma 3.8, x∗ _∈ L. Then, using Theorem 3.10, we see that the whole sequence

{xk_}converges tox∗. We finish the proof by observing that Theorem 3.3 guarantees thatx∗is stationary and so, from theRm₊-convexity of F, by virtue of Lemma 3.5,x∗ is a weak Pareto

optimum.

3.5 TheMSDMwith relative errors

Here we briefly sketch an inexact version of theMSDM. Instead of computingvk ₌v(xk)₌ argmin_v_∈_Rnϕ_xk(v)+1₂v2in Step 2 of Algorithm 3.2, we accept approximate solutions of the

problem with some prespecified tolerance. For a nonstationary pointx _∈Rnand a given toler-anceε_{∈ [}0,1), we say that a directionv_∈Rnis anε-approximate solutionof minv∈Rnϕ_x(v)+

1 2v2if

ϕx(v)+ 1 2v

2

≤(1−ε)θ (x),

whereϕx(v) =maxi=1,...,m∇Fi(x)⊤vandθ (·)is the optimal value of the above optimization problem. Callinghxthe objective function in this problem, i.e.,hx(v)_:=ϕx(v)+1₂v2, we see thatvis anε-approximate solution if, and only if, the relative error ofhxatvis at mostε, that is to say,

hx(v)−hx(v(x))

hx(v(x))

=

hx(v)−θ (x)

θ (x)

≤

ε.

Clearly, the exact directionv(x) is alwaysε-approximate at x for any ε _{∈ [}0,1). Moreover, v(x)is the unique 0-approximate direction at x. Also note that given a nonstationary pointx _∈

Rn andε _{∈ [}0,1), an ε-approximate directionv is always a descent direction. Indeed, from Proposition 3.1 and the definition ofε-approximate solution, it follows that∇Fi(x)⊤v_≤ϕx(v)+

1 2v

2

≤(1−ε)θ (x) <0 for alli. Whence, J F(x)v <0 and so, by Lemma 2.1,vis a descent direction forFatx.

We can now define the inexactMSDMin almost the same way as its exact counterpart: in Step 1, there is also the choice of the tolerance parameter ε _{∈ [}0,1), in Step 2, vk is taken as anε -approximate solution of minv∈Rnϕ_xk(v)+1₂v2and, finally, the stopping criterion in Step 3

becomesh_xk(vk)=0. In [19, 28] it is shown that the method is well-defined and that it has the

same convergence features as the exact one.

4 THE MULTIOBJECTIVE PROJECTED GRADIENT METHOD

(16)

finally, we sketch a numerically robust version of the method which admits relative errors on computing the search directions at each iteration.

LetF_:Rn→Rm be a continuously differentiable mapping andC _⊆Rnbe a closed and convex set. We are now interested in the followingconstrained multiobjective optimizationproblem:

minimize F(x)

subject to x_∈C. (23)

For this problem, we seek aPareto optimum(orefficientpoint), i.e., afeasiblepoint (x∗ _∈ C) such that there does not existx _∈C withF(x)_≤ F(x∗)andF(x)₌ F(x∗), or we look for a weak Pareto optimum(orweakly efficientpoint), that is to say, a feasiblex∗such that there does not existx _∈CwithF(x) <F(x∗).

4.1 The search direction for the multiobjective projected gradient method

For the constrained multiobjective problem, a necessary condition for optimality of a pointx¯_∈C isstationarity(orcriticality):

J F(x_¯)(C_{− ¯}x)_{∩ [−}Rm₊₊] = ∅,

whereJ F(x_¯)(C_{− ¯}x)_{:= {}J F(x_¯)(x_{− ¯}x)_: x_∈C_}. Sox¯ _∈CisstationaryforFif, and only if, for allv_∈C_{− ¯}xwe have J F(x_¯)v<0. Note that whenm₌1, we retrieve the classical stationarity condition for constrained scalar-valued optimization:∇F(x_¯),x_{− ¯}x_≥0 for allx_∈C.

Let us now develop an extension of the classical (scalar) projected gradient method for the vector-valued problem (23). First, forx _∈ C, we extend the search direction, originally defined for C ₌Rnin (7), by

v(x)_:=argmin

v∈C−x

βϕx(v)+ 1 2v

2_,

(24)

whereβ >0 is a parameter andϕx is defined in (4). Note that, in view of the strong convexity of the minimand, for anyx_∈C, theprojected gradient direction v(x)is always well-defined.

Now we extend the optimal value functionθ (_·)(given by (8) in the unconstrained case):

θ (x)_:= min

v_∈C₋xβϕx(v)+ 1 2v

2

=βϕx(v(x))+ 1 2v(x)

2_. ₍₂₅₎

We also need the following extension of Proposition 3.1 to the constrained case.

Proposition 4.1.Letv_:C _→Rnandθ_:C _→Rbe given by(24)and(25), respectively. Then, the following statements hold.

1. θ (x)_≤0for all x_∈C.

2. The functionθ (_·)is continuous.

(17)

(a) The point x _∈C is nonstationary.

(b) θ (x) <0.

(c) v(x)₌0.

In particular, x is stationary if and only ifθ (x)₌0.

Proof. 1. Recalling that 0∈C₋x, the result follows as in the proof of Proposition 3.1. 2. Letx_∈Cand_{x(k)_{} ⊂}Cbe a sequence such that limk_→∞x(k)=x. Sincev(x)∈C−x, we havev(x)₊x₋x(k)_∈C₋x(k). Thus, from (24) and (25), we have

θ (x(k))_≤βϕ_x(k)(v(x)+x−x(k))+ 1

2v(x)+x−x

(k)

2.

Hence, from the first inequality of (5), we obtain

θ (x(k)) _≤ βϕ_x(k)(v(x))+βϕ_x(k)(x−x(k))

+1

2v(x) 2₊1

2x−x

(k)2₊_v(_x_),_x₋_x(k)_.

Since F is continuously differentiable andu _→ maxi{ui}is continuous, taking lim supk→∞

on both sides of the above inequality yields

lim sup k_→∞

θ (x(k))_≤θ (x) . (26)

Now, observe that sincev(x(k))_∈C₋x(k), we havev(x(k))₊x(k)₋x _∈C₋x. Once again from (24) and (25), we obtain

θ (x)_≤βϕxv(x(k))+x(k)−x+ 1 2v(x

(k)₎

+x(k)₋x2.

Using once again the first inequality of (5), we have

θ (x) _≤ βϕx

v(x(k))₊βϕx

x(k)₋x

+1

2v(x

(k)₎

2+1

2x

(k)

−x2₊v(x(k)),x(k)₋x.

Note that for alli and anyz_∈C, from the Cauchy-Schwarz inequality and item 1, we have that

−∇Fi(z)v(z) +₂1z2 ≤θ (z)≤ 0. Therefore, since Fis continuously differentiable and

{x(k)_}is bounded, because it converges, we see thatv(x(k))is bounded for allk. So, taking lim infk_→∞on both sides of the above inequality, we get

θ (x) _≤ lim inf k_→∞

βϕxv(x(k))+ v(x(k))/2

= lim inf k→∞

θ (x(k))₊βϕxv(x(k)−ϕx(k)

v(xk)

≤ lim inf k→∞

θ (x(k))₊βJ F(x)₋J F(x(k))v(x(k))

≤ lim inf k→∞ θ (x

(18)

where the second inequality follows from (5). We complete the proof by combining this inequal-ity with with (26).

3. All results follow as in the proof of Proposition 3.1. (A formal proof of this equivalences for

a more general case can be found in [25, Proposition 3].)

4.2 The multiobjective projected gradient algorithm

We can now define the extension of the classical (scalar) projected gradient method for the con-strained multiobjective optimization problem (23).

Algorithm 4.2.The multiobjective projected gradient method (MPGM)

1. Chooseβ >0,σ _∈(0,1)andx0_∈C. Setk_:=0.

2. Computevk _:=v(xk)₌argmin

v∈C−xk

βϕ_xk(v)+

1 2v

2 ,

whereϕ_xk(v):= max

i=1,...,m∇Fi(x k₎⊤_v_.

3. Computeθ (xk)₌βϕ_xk(vk)+

1 2v

k

2. Ifθ (xk)₌0, then stop.

4. Choosetkas the largestt ∈ {1/2j: j =0,1,2, . . .}such that

Fxk₊tvk_≤F(xk)₊σt J F(xk)vk.

5. Setxk+1_:=xk₊tkvk,k:=k+1 and go to Step 2.

Note that, wheneverC ₌Rn, takingβ ₌1, we retrieve theMSDM. Observe that, by virtue of Proposition 4.1, an alternative stopping criterion in Step 3 isvk ₌0. Note also that Algorithm 4.2 either terminates in a finite number of iterations with a stationary point or it generates an infinite sequence of nonstationary points (Proposition 4.1).

Now observe that if Step 4 is reached, by virtue of Lemma 2.1, in finitely many half reductions, we obtain the step length which is used in Step 5 to define the next iterate. Note that instead of the factor 1/2 in the reductions of Step 4, we could have used any other scalarν_∈(0,1). Finally, observe that sinceθ (xk_{) <}_{0 implies}_{J F}₍_xk_)vk _<_{0, from Steps 3-5, it follows that whenever} the method generates and infinite sequence, the objective values areRm₊-decreasing, i.e.,

F(xk+1) < F(xk) for allk₌0,1, . . .

If the method stops at iterationk0, this inequality hold fork=0,1, . . . ,k0−1.

4.3 Convergence analysis of theMPGM: the general case

(19)

Lemma 4.3.Let_{xk_{} ⊂}Rnbe a sequence generated by Algorithm 4.2. Then, we have_{xk_{} ⊂}C.

Proof. Let us use an inductive argument. From the initialization of the algorithm, x0 _∈ C. Now assume thatxk _∈C. Observe that, by Step 4 of Algorithm 4.2,tk ∈(0,1]and, by Step 2,

vk_∈C₋xk. Then, for somezk_∈C,vk₌zk₋xkand from the convexity ofC, it follows that xk+1 ₌xk₊tkvk ₌(1−tk)xk₊tkzk _∈Cand the proof is complete.

Now we see that whenever a sequence produced by Algorithm 4.2 has accumulation points, they are all feasible stationary points for the problem.

Theorem 4.4. Every accumulation point, if any, of a sequence_{xk_}generated by Algorithm 4.2 is a feasible stationary point.

Proof. The feasibility follows combining the fact thatCis closed with Lemma 4.3. The rest of the proof is similar to the one of Theorem 3.3: it suffices to use the definitions ofv(x)andθ (x)

given in (24) and (25), respectively.

4.4 Full convergence of theMPGM: theR+m-convex case

Let us note that, as being the maximum of convex functions, the objective of the minimization problem which definesv(x)in (24) is also convex and, as we also have thatC₋xis a closed convex subset ofRn, by [3, Proposition 4.7.2], a necessary condition for the optimality ofv(x) is the existence ofw(x)_∈Rm with

wi(x)≥0 and m

i₌1

wi(x)=1 (27)

such that

β

m

i=1

wi(x)∇Fi(x)+v(x), v−v(x)

≥0 for allv_∈C₋x. (28)

This condition can be written as

x₋β

m

i=1

wi(x)∇Fi(x)−x+v(x),u−x+v(x)

≤0 for allu_∈C.

Now using a very well-known characterization of orthogonal projections (see, for instance, [3, Proposition 2.2.1]), we have

x₊v(x)₌PC

x₋β

m

i=1

wi(x)∇Fi(x)

,

where PC(z) _:= inf{u ₋z_: u _∈ C_}is well-defined because C is closed and convex. We conclude that there existsw(x)_∈Rm satisfying (27) such that

(20)

Observe that in the constrained scalar-valued case (m ₌ 1), we have J F(x)⊤ _{= ∇}F(x)and w(x)₌1, which means that

v(x)₌PC(x₋β_∇F(x))₋x,

and this is precisely the search direction used in the projected gradient method for real-valued op-timization. So this is an extension to the multiobjective setting of the classical projected gradient method.

Lemma 4.5. Suppose that F is Rm₊-convex. Let _{xk_} be an infinite sequence generated by Algorithm 4.2. If forx_ˆ_∈C and k_≥0, we have F(x_ˆ)_≤F(xk), then

xk+1_{− ˆ}x2_≤xk_{− ˆ}x2₊2βtk

wk,J F(xk)vk ,

wherewk_:=w(xk₎_∈_Rm _{is such that}₍₂₇₎_and₍₂₉₎_{hold with x}₌_xk_.

Proof. Sincexk+1₌xk₊tkvk, we have

xk+1_{− ˆ}x2₌xk_{− ˆ}x2₊t_k2vk2₋2tk

vk,x_ˆ₋xk. (30)

Let us analyze the rightmost term of the above expression. Sincevk ₌v(xk)andwk ₌w(xk), by (28) withx ₌xk, we get

βJ F(xk)⊤wk₊vk, v₋vk

≥0 for allv_∈C₋xk.

Takingv_{= ˆ}x₋xk _∈C₋xkin the above inequality, we obtain

−

vk,x_ˆ₋xk

≤β

wk,J F(xk)(x_ˆ₋xk)

−β

wk,J F(xk)vk

− vk2. (31)

Now, observe that, the convexity ofF and (18) yieldJ F(xk₎₍_x_ˆ₋_xk₎ _≤ _F₍_x_ˆ₎₋_F₍_xk₎_{. This} fact, together withwk_≥0, andF(x_ˆ)_≤ F(xk)implies

wk,J F(xk)(x_ˆ₋xk)

≤

wk,F(x_ˆ)₋F(xk)

≤0.

Also, since J F(xk)vk < 0, becausexk is nonstationary, we havewk,J F(xk)vk < 0. Thus, recalling thatβ >0, from (31) it follows that

−

vk,x_ˆ₋xk

≤βwk,J F(xk)vk

− vk2.

The above inequality, together with (30), shows that

xk+1_{− ˆ}x2_≤xk_{− ˆ}x2₊t_k2vk2₊2βtk

wk,J F(xk)vk

−2tkvk2.

And the result follows becausetk _≥0.

(21)

Assumption 4.6. Every Rm₊-decreasing sequence _{yk_{} ⊂} F(C) _{:= {}F(x)_: x _∈ C_}is Rm₊ -bounded from below by an element of F(C).

We can now state and prove the main result on the convergence analysis of theMPGM: under similar conditions of those of Theorem 3.11, regardless of the initial (feasible) guess, the method converges to an optimum of problem (23).

Theorem 4.7. Assume that F_: Rn →Rm isRm₊-convex and that Assumption 4.6 holds. Then, any sequence_{xk_}generated by Algorithm 4.2 converges to a weak Pareto optimum.

Proof. Let us define the set

L _:=x_∈C_: F(x)_≤ F(xk)for allk,

and take xˆ _∈ L, which exists by Assumption 4.6. Since F is Rm₊-convex, it follows from Lemma 4.5 that

xk+1_{− ˆ}x2_≤xk_{− ˆ}x2₊2βtk

wk,J F(xk)vk

for allk. (32)

First, we prove that the sequence{xk_}is quasi-F´ejer convergent to the setL. Let{e1, . . . ,em_}be the canonical basis ofRm_{. We observe that, for each}_k_{, we can write}

wk₌

m

i=1

wk_iei.

Let us note that, from (27), 0≤w_ik_≤1 for alliandk. Thus, from (32), we have

xk+1− ˆx

2

≤xk− ˆx

2

+2βtk m

i=1

ei,J F(xk)vk

for allk.

Definingεk:=2βtkmi₌1|ei,J F(xk)vk|, we have thatεk≥0 and

xk+1− ˆx

2

≤xk − ˆx

2

+εk.

Thus, let us prove that∞

k₌0εk<∞. From the Armijo-like condition in Step 4 of the algorithm, we obtain

−tkei,J F(xk)vk ≤ 1 σ

Fi(xk)−Fi(xk+1)

.

Sincexk is nonstationary, we also have for alli andkthat

ei,J F(xk)vk_{= ∇}Fi(xk)⊤vk<0,

which means that−tkei,J F(xk)vk =tk|ei,J F(xk)vk|. Hence, we obtain

εk ≤ 2β

σ

m

i=1

(22)

Adding up fromk₌0 tok₌N at the above inequality, whereN is any positive integer, we get

N

k=0

εk≤ 2β

σ

m

i=1

Fi(x0)₋Fi(xN+1).

Since N is an arbitrary positive integer, β, σ > 0 and F(x_ˆ) _≤ F(xk)for allk, we conclude that∞

k=0εk <∞. Therefore, recalling thatxˆwas an arbitrary element ofL, we see that{xk} converges quasi-F´ejer toL.

By Theorem 3.10, it follows that{xk_}is bounded. So, sinceC is closed,{xk_{} ⊂}C has at least one feasible accumulation point, sayx∗, which is feasible and stationary by Theorem 4.4. Using now theRm

+-convexity ofF, from Lemma 3.5, it follows thatx∗ ∈Cis a weak Pareto optimum.

As in the end of the proof of Theorem 3.11, we now see thatx∗_∈L. SinceF(xk+1) < F(xk)for allk, by Lemma 3.8, 1,F(x∗)_≤ F(xk)for allk, andx∗_∈L. So, once again from Theorem 3.10, we conclude that_{xk_}_{converges to}_x∗_∈_C_{, a weak Pareto optimum.}

4.5 TheMPGMwith relative errors

As we did for theMSDM in Subsection 3.5, we present here a quick overview of an MPGM version implemented with relative errors in the search directions. Given a nonstationaryx _∈C and a toleranceε _{∈ [}0,1), we say thatv _∈ C₋x is anε-approximate solutionof the problem minv∈C−xϕx(v)+1₂v2if

hx(v)≤(1−ε)θ (x),

wherehx(v)_:=ϕx(v)+1/2v2, withϕxgiven by (4).

Similar comments as those made in Subsection 3.5 forε-approximate solutions for unconstrained problems can be made here for the constrained case. The modification of the (exact)MPGMare basically the same as those proposed in Subsection 3.5 for the (exact) MSDM: in Step 2 of Algorithm 4.2, instead of definingvk as the exact solution of min_v_∈C₋xkϕ_xk(v)+1/2v2, we

takevkas anε-approximate solution of this problem. The stopping rule at Step 3 (θ (xk)₌0) is replaced byh_xk(vk)=0.

For a more general case (vector optimization), a comprehensive study of the inexact version of theMPGMcan be found in [22], where, under essentially the same hypotheses made here for the (exact)MPGM, all convergence results are fully proved.

5 THE MULTIOBJECTIVE NEWTON METHOD

We now go back to the unconstrained multiobjective problem (1). In order to define a Newton-type search direction for this kind of problems, we first recall some convexity notions for vector-valued mappings. We say that F_:Rn → Rm is strictly Rm₊-convex if the inequality (17) is satisfied strictly, that is, if F is componentwise strictly convex. We also say thatF isstrongly

Rm₊-convexif there existsuˆ _∈Rm₊₊such that

F(λx₊(1−α)y)_≤λF(x)₊(1−λ)F(y)₋1

2λ(1−λ)x−y 2

ˆ