Comput. Appl. Math. vol.29 número2

(1)

ISSN 0101-8205 www.scielo.br/cam

Error bound for a perturbed minimization problem

related with the sum of smallest eigenvalues

MARCOS VINICIO TRAVAGLIA

Departamento de Matemática, Centro de Ciências da Natureza Universidade Federal do Piauí, 64049-550 Teresina, PI, Brazil

E-mail: [email protected]

Abstract. LetC be an_×nsymmetric matrix. For each integer 1 _≤ k < nwe consider the minimization problemm(ε) _:=min_X{Tr_{C X_{} +}εf(X)}. Here the variableXis ann_×n

symmetric matrix, whose eigenvalues satisfy

0_≤λi(X)≤1 and n X

i=1

λi(X)=k,

the numberεis a positive (perturbation) parameter and f is a Lipchitz-continuous function (in general nonlinear). It is well known that whenε ₌0 the minimum value,m(0), is the sum of the smallestkeigenvalues ofC.

Assuming that the eigenvalues ofCsatisfyλ₁(C)_{≤ ∙ ∙ ∙ ≤}λ_k(C) < λ_k

+1(C)≤ ∙ ∙ ∙ ≤λn(C),

we establish the following upper and lower bounds for the minimum valuem(ε):

k X

i=1

λ_i(C)₊εfˉ_≥m(ε)_≥

k X

i=1

λ_i(C)₊εfˉ₋ 2k L

2 λ_k₊₁(C)₋λ_k(C)ε

2_,

where fˉis the minimum value of f over the solution set of unperturbed problem andL is the Lipschitz-constant of f. The above inequality shows that the error by replacing the upper bound (or the lower bound) by the exact value is at least quadratic in the perturbation parameter. We also

treat the case thatλ_k

+1(C)=λk(C). We compare the exact solution with the upper and lower

bounds for some examples.

Mathematical subject classification: 15A42, 15A18, 90C22.

Key words: matrix analysis, sum of smallest eigenvalues, minimization problem involving matrices; nonlinear perturbation; semidefinite programming.

(2)

1 Introduction

We denote byMnthe set ofn×nreal matrices. The sum of diagonal ofC ∈ Mn

is denoted by Tr_{C_}. If C is a symmetric matrix then the sum of their first (smallest)keigenvalues,Pk

i=1λi(C)(multiplicity included), can be obtained as

the minimum value of the following minimization problem with matrix vari-able (see the proposition 2.3):

min

Tr_{C X_}: Tr_{X_{} =}k,I ₋X 0 and X 0 ₌

k

X

i=1

λ_i(C). (1)

The notation X 0 means thatX is ann_×n(symmetric) positive semidefi-nite matrix, that is,Xis symmetric and its eigenvalues are nonnegative. We write Y X when the difference matrixY ₋ X 0. We denote by I the identity matrix.

Note that the objective function, Tr_{C X_}, and the restriction, Tr_{X_{} =}k, of the minimization problem (1) are linear in the variableX. On the other hand, the restrictionsI ₋X 0 and X 0 are convex but nonlinear. Due to these last two restrictions the problem (1) is, in general, not aLinear Programming(LP). We denote byK_{:= {}_X _∈ _M_n_:_Tr_{_X_{} =}_k_,_I₋_X _{0 and}_X ₀_}_{the domain} of the objective function of the minimization problem (1). This means that the elements ofK_{are symmetric matrices, whose eigenvalues satisfy: 0} _≤_λ

i(X)≤

1 fori =1,2, . . . ,nandPn

i=1λi(X)=k.

In this work we propose to study a nonlinear perturbation of the problem (1) defined as follows:

m(ε)_:=minTr_{C X_{} +}εf(X)_: X _∈K _, ₍₂₎

whereε (perturbation parameter) is a nonnegative real number and f_:K _→

R is a function that is, in general, nonlinear. The function εf(X) is called perturbation. We denote by m(ε)the minimum value of the problem (2). In general the solution of (2) depends on ε. Moreover, according to (1) one has m(0)₌Pk

i=1λi(C).

(3)

Semidefinite Programming: In particular casek ₌1 the restrictionI₋X 0 can be dropped in (1). To see this, note that ifλ_i(X) _≥ 0 andPn

i=1λi(X) =

Tr_{X_{} =}1 then 0_≤λi(X)≤1. Hence,I−X 0. The minimization problem

(k ₌1)

min

Tr_{C X_}: Tr_{X_{} =}1 and X 0 (3)

is a special case of the following problem:

minTr_{C X_}:Tr_{A_iX_{} =}b_i withi ₌1, . . . ,pandX 0 , (4)

which is called standardsemidefinite programming(SDP). In (4) A_i aren _×n symmetric matrices. The SDP has many applications in Combinatorial Opti-mization. See [5] for a survey on SDP and [6] for the relation between SDP and Eigenvalue Optimization.

A bit more general case of (3) is min_{Tr_{C X_}: Tr_{A X_{} =} b and X 0_} with A≻0 (that is, Ais strict positive definite) andb >0. We can reduce this case to (3) by pluggingC˜ _:=b A−1/2_{C A}−1/2_{in the place of}_C_{in (3).}

Strictly Convex Perturbation of SDP:It is convenient to add a strictly con-vex functionεf(x)in order to solve the unperturbed problem (4). This makes the perturbed minimization problem strictly convex. Hence, it has only one minimizer. One hopes that this minimizer approximates to a solution of (4) as ε _→ 0. The interior point methods [10], for solve SDPs, make use of the strictly convex function εf(X) _{= −}εlog det(X), which is a log-barrier. Since this function is not Lipschtiz-continuous our error bound can not be used. On the other hand, the authors [7] arrived at an algorithm for SDPs that has several advantages over existing techniques using the perturbation functions εf(X)_{= −}εlog det(X)andεf(X)₌ε1₂Tr_{X2

}. The last one is strictly convex and Lipschitz-continuous. As the main advantage, this algorithm is a first-order method, which makes it scalable.

2 Results

(4)

and 0 _≤ u(ε)₋m(ε) _≤ u(ε)₋ℓ(ε). Under some condition on f we prove thatu(ε)₋ℓ(ε) ₌ constant ε2 (see theorem 2.8). Hence, both m(ε)₋ℓ(ε) andu(ε)₋m(ε) go at least quadratic to zero asε goes to zero. We interpret u(ε)₋ℓ(ε)as an error bound. In contrast to the minimization problem (2), the authors [1] proved that for perturbed LP (see section 4)u_LP(ε)₋ℓ_LP(ε)₌0 for all 0 _≤ ε _≤ ε_o with someε_o > 0. This means that for perturbed LP there is no error if we require thatε be small enough. This is, in general, not the case for the minimization problem (2) (see example 1 of section 3). The author [9] derived error bound for perturbed LP when f is strictly convex, and in [8] the perturbation results of [1] were extend to convex programmings.

As in [1], we assume that f_:K_→_R_{be a Lipschitz-continuous function, that} is, there is a constant 0 _≤ L < _∞such that for all X,Y _∈ K _{the inequality} |f(X)₋ f(Y)_{| ≤} L_kX ₋Y_kis true. Throughout this paper_{k ∙ k} denotes the Frobenius-norm, that is, for X _∈ M_n we define

kX_{k :=}pTr_{XT_X

} =

v u u t

n

X

i,j=1 X_i2_,_j.

It is well known that for a symmetric matrixXthe equality_kX_{k =}

q Pn

i=1λ 2

i(X)

holds.

The following proposition establishes that the set K_{is a bounded subset of}

M_n _≃Rn×n. Hence,K_{is compact.}

Proposition 2.1. The setK_{is a subset of the ball of M}_n_{with size}√_{k, that is,}

if X _∈K_then_k_X_{k ≤}√_k.

Proof. ForX _∈K_{we have 0}_≤_λ

i(X)≤1 fori =1, . . . ,n. Therefore

kX_k2₌

n

X

i=1

λ2_i(X)_≤ n

X

i=1

λ_i(X)₌Tr_{X_{} =}k

Since K _{is compact and Tr}_{_{C X}_} _{is continuous the minimum value of the} problem (1) is attained.

(5)

Definition 2.2 (The values d,r,s, λ_r(C), and λ_s(C)). Consider the eigen-values of the symmetric matrixC _∈ Mn in increasing order, that is, λ₁(C) ≤ λ₂(C)_{≤ ∙ ∙ ∙ ≤}λ_n(C). For eachk₌1,2, . . . ,nwe define:

d ₌d(C,k)_:=♯

i _{∈ {}1, . . . ,n_}: λ_i(C)₌λ_k(C)

Note thatd is multiplicity (degeneracy) ofk-th eigenvalue. Further we define:

R(C,k)_:=

i _{∈ {}1, . . . ,n_}: λ_i(C) < λ_k(C) ,

S₍_C_,_k₎_:=_i _{∈ {}₁_{, . . . ,}_n_}:_λ

i(C) > λk(C) ,

r ₌r(C,k)_:=

(

max_{i_:i_∈R₍_C_,_k₎_} _if R₍_C_,_k₎_{6= ∅}

0 if R₍_C_,_k₎_{= ∅}_, (5)

and

s ₌s(C,k)_:=

(

min_{i_:i_∈S₍_C_,_k₎_} _if S₍_C_,_k₎_{6= ∅}

n₊1 if S₍_C_,_k₎_{= ∅}_. (6)

In the caser ₌ 0 we do the following convention λ_r(C) ₌ λ₀(C) _{:= −∞} and in the case s ₌ n ₊ 1 we do λ_s(C) ₌ λ_n₊₁(C) _{:= +∞}. Note that s₋r₋1₌d holds true.

The following example illustrates the above definition:

Example. Forn=10 andk=6. If the eigenvalues ofCare:

a) 2,3,4,4,4,4,4,4,11,12 thenλ_k ₌ 4, d = 6, r = 2, s = 9, λ_r ₌ 3 andλ_s ₌11.

b) 4,4,4,4,4,4,4,4,11,12 thenλ_k ₌4,d ₌8,r ₌0,s ₌9,λ_r _{= −∞} andλ_s ₌11.

c) 2,2,2,3,4,4,4,4,4,4 then λk = 4, d = 6, r = 4, s = 11, λr = 3

andλ_s _{= +∞}.

(6)

Proposition 2.3. The minimum value of the problem(1)is the sum of the small-est k eigenvalues of the symmetric matrix C. Moreover, the set of its minimizers is _





V





I_r | 0 | 0

− − − − − −

0 | Z | 0

− − − − − −

0 | 0 | 0



V

T

: Z _∈K d

 



, (7)

where

K d :=

Z _∈ M_d with Tr_{Z_{} =}k₋r and I Z 0 . (8)

Here d ₌ d(C,k)is the multiplicity of the k-th eigenvalue of C and V is any n_×n orthogonal matrix whose columns are the eigenvectors (in increasing order of their eigenvalues) of the matrix C.

The matrix I_r in(7) is the identity matrix of order r , where r ₌ r(C,k) is defined by(5). If r ₌0 then this identity matrix does not appear in(7). The block matrix0in the diagonal of(7)is of order n₊1₋s, where s₌s(C,k)is defined by(6). If s =n+1then this block matrix0does not appear in(7).

If d ₌ 1then there is only one minimizer, namely: Pk

i=1viviT (orthogonal

projection with rank k), wherev_iis any (normalized) eigenvector corresponding to the i -th eigenvalue of C.

Proof. See appendix A.

Remark 2.4. In order to simplify the notation we mean by A_⊕B(direct sum) the block matrix

A | 0

− − − −

0 | B

. That is, A_⊕Bis a square matrix of orderna+nb

ifAandBare square matrices of orderna andnbrespectively.

Since the objective function Tr_{C X_{} +}εf(X)is the sum of two continuous functions on the compact setK_{its minimum value,}_m(ε), is attained. We denote byX

∗(ε)the set argmin{Tr{C X} +εf(X): X ∈ K} (the set of minimizers).

The proposition 2.3 states thatm(0)₌Pk

i=1λi(C)and X

∗(0)=

V(Ir ⊕Z⊕0)VT: Z ∈ Md with Tr{Z} =k−r andI Z 0 .

An important value of f in order to study the relation betweenm(ε)andm(0) is fˉ_:=min_{f(X)_: X _∈X

(7)

We now establish the following upper bound form(ε):

Proposition 2.5 (Upper bound). For the minimum value m(ε) of the prob-lem(2)the following upper bound holds:

u(ε)_:=m(0)₊εfˉ_≥m(ε)

for allε _≥0.

Proof. Take Y_∗ ∈ argmin{f(X)_: X ∈ X

∗(0)}. Since Y∗ ∈ K we have by

definition ofm(ε)that

m(ε) _≤ Tr_{CY_∗_{} +}εf(Y_∗)

= m(0)₊εf(Y_∗) since Y_∗_∈X ∗(0)

= m(0)₊εfˉ

Remark 2.6. In the cased(C,k)₌1 we have fˉ= f Pk

i=1viv T i

because

X ∗(0)=

P_:=

k

X

i=1

v_iv_iT

(there is only one minimizer). Hence,u(ε)₌m(0)₊εf(P)is an upper bound. On the other hand, in the case d(C,k) > 1, the function U(ε) _:= m(0) ₊ εf(P) ₌ Tr_{C P_{} +}εf(P)is clearly an upper bound form(ε). If f(P) > fˉ thenU(ε) >u(ε)forε >0. Therefore, the upper boundU(ε)is not interesting because when we are estimating an error we try, in general, to find smaller upper bounds and larger lower bounds.

We can easily obtain a crude lower bound form(ε)as the following:

Proposition 2.7 (A linear error bound). For the minimum value of (2) the following upper and lower bounds hold:

m(0)₊εfˉ≥m(ε)_≥m(0)₊εfˉ−2√k Lε

(8)

Proof. The upper bound,u(ε)_:=m(0)₊εfˉ, was proved in proposition 2.5. To prove the lower bound takeX_∗ _∈X_∗_(ε)_and_Y_∗ _∈_argmin_{_f₍_X₎_: _X _∈X_∗₍₀₎_}_. Hence

m(ε)₋u(ε) _:= m(ε)₋m(0)₋εfˉ

= Tr_{C X_∗_{} +}εf(X_∗)₋Tr_{CY_∗_{} −}εf(Y_∗)

= Tr{C X_∗} −Tr{CY_∗}+ε

f(X_∗)₋ f(Y∗)

≥ 0₊ε

f(X_∗)₋ f(Y_∗)

since _Y

∗∈X∗(0)

and X_∗∈K

≥ −ε_|f(X_∗)₋ f(Y_∗)_|

≥ −εL_kX_∗₋Y_∗_k since f is Lipschitz

≥ −εL_kX_∗_{k + k}Y_∗_k ≥ −εL(√k₊√k)

In last step we used that X_∗ andY_∗ _∈ K_{and the proposition 2.1. This proves}

the proposition 2.7.

The main result of this work is the following theorem, which improves the proposition 2.7:

Theorem 2.8 (A quadratic error bound). If f is a Lipschitz-continuous func-tion with Lipschitz-constant L then the following upper and lower bound hold:

m(0)₊εfˉ_≥m(ε)_≥m(0)₊εfˉ₋ L 2

α(C,k)ε

2 for all0_≤ε. Hereα(C,k)is strict positive and given by

α(C,k)₌ 1 2min

λ_k(C)₋λ_r(C)

s₋(k₊1) ,

λ_s(C)₋λ_k(C)

k

, (9)

where the values r=r(C,k)and s=s(C,k)are given by the definition2.2.

Example. In the last example (n₌10, andk ₌6) we have for a) α(C,6)₌ 1/4, b) α(C,6)₌7/12, c) α(C,6)₌1/8 and d) α(C,6)_{= +∞}.

Remark 2.9. In the particular case, where the eigenvalues of C are ordered asλ1≤λ2≤λk < λk+1≤λn, that is,s=k+1, we haveα=

λ_k₊₁−λ_k

(9)

Remark 2.10. In the particular casek ₌1 (_⇒r ₌0) we obtain: α(C,1)₌ λ1+d(C,1)−λ1

2 ,

whered(C,1)is the multiplicity (degeneracy) ofλ₁(C). That is, ifk ₌1, then αcan be taken as the half of the gap of the first eigenvalue.

Remark 2.11. The expression (9) for the strict positive constant α(C,k) is obtained in lemma B.1 (see appendix B). The lemma B.1 states that α(C,k) given by (9) satisfies the inequality

Tr{C X} −

k

X

i=1

λ_i(C)_≥α(C,k) min

Y∈X ∗(0)

kX−Yk2 (10)

for allX _∈K_.

We comment two cases whereα(C,k)_{= +∞}.

Remark 2.12. Ifd ₌ n then one hasC ₌ λ₁(C)I (a multiple of the iden-tity matrix). Note that in this case we haver ₌ 0 and s ₌ n₊1. Hence, by definition (9)α(C,k) _{= +∞}. Note that the left hand side (LHS) of (10) is zero since Tr_{C X_{} =} kλ₁(C) for X _∈ K_. _{This means that} _X

∗(0) = K_{. The RHS of (10) is also zero because in that case} _X

∗(0) = K.

There-fore, the largest (the best)α for which the inequality (10) holds is_+∞. Note also that, in the case d ₌ n, the equality m(ε) ₌ m(0)₊εfˉ holds for all ε _≥ 0 since X_∗(0) ₌ K_{. This is in agreement with the theorem 2.8: 0} _≤

m(0) ₊εfˉ ₋m(ε) _≤ _α(L_C2₎ε2 since _+∞L2 ε2 ₌ 0. That is, the theorem 2.8 confirms that if d ₌ n the exact minimum value m(ε) coincides with the upper bound for allε_≥0.

Remark 2.13. Ifk ₌nthens ₌n₊1. According to definition (9) we also have α(C,k)_{= +∞}. Note that, ifk ₌nthenK_{= {}_I_{} =}X

∗(0)because if X ∈ Mn

(10)

Proof of Theorem 2.8

Proof. The upper bound was proved in proposition 2.5. To prove the lower bound take a X_∗ _∈ X

∗(ε), aY∗ ∈ argmin{f(X): X ∈ X∗(0)} and a Y∗∗ ∈

argmin_{kX_∗₋Y_k2_:Y _∈X ∗(0)}.

Since X_∗_∈X

∗(ε)andY∗∈Kwe have:

0_≤Tr_{CY_∗_{} +}εf(Y_∗)₋ Tr_{C X_∗_{} +}εf(X_∗). (11)

Developing (11), using that Tr_{CY_∗_{} =}Pk

i=1λi(C)and (10) we obtain:

0 _≤ Tr_{CY_∗_{} +}εf(Y_∗)₋ Tr_{C X_∗_{} +}εf(X_∗)

= − Tr_{C X_∗_{} −}Tr_{CY_∗_}

+ε f(Y_∗)₋ f(X_∗)

= − Tr_{C X_∗_{} −}

k

X

i=1

λi(C)

!

+ε f(Y_∗)₋ f(X_∗)

≤ −α(C,k) min

Y∈X ∗(0)

kX_∗₋Y_k2 ₊ε f(Y_∗)₋ f(X_∗) see (10) = −α(C,k)_kX_∗₋Y_∗∗_k2₊ε f(Y_∗)₋ f(X_∗)

≤ −α(C,k)_kX_∗₋Y_∗∗_k2₊ε f(Y_∗∗)₋ f(X_∗) by definition ofY_∗ ≤ −α(C,k)_kX_∗₋Y_∗∗_k2₊εL_kY_∗∗₋X_∗_k since f is Lipschitz. We conclude two things:

α(C,k)_kX_∗−Y_∗∗k2≤εLkX_∗−Y_∗∗k (12) and

Tr_{CY_∗_{} +}εf(Y_∗)₋ Tr_{C X_∗_{} +}εf(X_∗)

≤ −α(C,k)_kX_∗₋Y_∗∗_k2₊εL_kY_∗∗₋X_∗_k. (13) Note that Tr_{CY_∗_{} =} Pk

i=1λi(C) = m(0) (because Y∗ ∈ X∗(0)). Further

recall that

f(Y_∗)₌ min

X∈X_∗(0) f(X)=: ˉf and Tr{C X∗} +εf(X∗)=m(ε)

becauseX_∗_∈X

∗(ε). Hence, we rewrite (13) as

(11)

that is,

m(ε)_≥m(0)₊εfˉ₊α(C,k)_kX_∗₋Y_∗∗_k2₋εL_kX_∗₋Y_∗∗_k. (15) Now, we consider two cases in (15):

Case 1: X_∗ ₌ Y_∗∗. In this case we obtain directly from (15) that m(ε) _≥

m(0)₊εfˉ.

Case 2: X_∗ ₆₌ Y_∗∗. In this case follows from (12) that_kX_∗₋Y_∗∗_{k ≤} _α(ε_CL_,_k₎. Replacing this last inequality in (15) we obtain:

m(ε) _≥ m(0)₊εfˉ+α(C,k)_kX∗−Y∗∗k2−

L2

α(C,k)ε

2

≥ m(0)₊εfˉ₋ L 2

α(C,k)ε

2_.

Sincem(0)₊εfˉ _≥ m(0)₊εfˉ₋ _α(L_C2₎ε2 we conclude that in both cases the following lower bound form(ε)holds:

m(ε)_≥m(0)₊εfˉ₋ L 2

α(C,k)ε

2

.

This proves the theorem 2.8.

3 Examples and comparison between the perturbed matrix minimization problem (2) and perturbed LP

In this section we give two examples of the minimization problem (2) forn ₌2 andk ₌1. We present the exact minimum value and compare it with the upper and lower bounds. The first example consists in an perturbed SDP that is not a perturbed LP.

Example 1: Consider the matrixC ₌_{1 1}

1 1

,n ₌ 2,k ₌1 and the nonlinear function f(X) ₌ X₁_,₁X₁_,₁. In this example the minimization problem (2) is given by:

m(ε)₌ min

X₁_,₁+X₂_,₂=1,

X₁_,₁,X₂_,₂≥0,

X2₁_,₂≤X₁_,₁X₂_,₂

(12)

Remark 3.1. In (16) the constraint, X₁2_,₂ _≤ X₁_,₁X₂_,₂, is nonlinear and the variable X₁_,₂ appears in the objective function. Hence, the above perturbed matrix minimization problem is not a perturbed LP.

In this example we have:

– The eigenvalues and eigenvectors of C are: λ₁(C) ₌ 0, λ₂(C) ₌ 2, vT₁ ₌ √1

2(1,1),v

T

2 = 1

√

2(1,−1); – The solution set of linear part isX

∗(0)=

n

v1v1T =

h

1/2 −1/2

−1/2 1/2

io

;

– The minimum value of f on the solution set of the linear part is fˉ _:= min_X_∈_X

∗(0) f(X)=1/4;

– The Lipschitz-constant is L ₌ √2. To see this, note that for X,Y _∈ K holds:

X₂_,₂₌1₋X₁_,₁ and Y₂_,₂₌1₋Y₁_,₁ (17) and

0_≤ X1,1,Y1,1≤1. (18) Taking (17) and (18) into account, we get

kX ₋Y_{k =}

|X₁_,₁₋Y₁_,₁_|2₊2_|X₁_,₂₋Y₁_,₂_|2_{+ |}X₂_,₂₋Y2,2|2

1/2

= 2_|X₁_,₁₋Y₁_,₁_|2₊2_|X₁_,₂₋Y₁_,₂_|2 1/2

due to (17)

≥ √2_|X₁_,₁₋Y1,1| = √

2

2 2|X1,1−Y1,1| ≥

√ 2

2 |X1,1+Y1,1| |X1,1−Y1,1| due to (18) = √1

2 |X 2 1,1−Y

2 1,1|. That is,

|X21,1−Y 2 1,1| ≤

√

2kX−Yk (19)

(13)

K_with _X ₆₌ _Y_{} ≥} _lim

δ→0|f(A)− f(Bδ)|/kA− Bδk =

√

2 for the choiceA₌_{1 0}

0 0

andB_δ ₌₁₋_δ₀

0 δ

with 0< δ _≤1;

– The upper bound in this example (see proposition 2.5) isu(ε) ₌ 1₄ε; – Sincer ₌0 ands ₌2 the lower bound in this example (see theorem 2.8)

isℓ(ε) ₌ 1₄ε₋2ε2;

According to proposition C.1 the exact minimum value m(ε) can be expres-sed as:

m(ε)₌ max

r∈[0,1]

n

1₊εr ₋p(εr)2₊₁₋_ε_r2o

. (20)

For some values of ε > 0 the maximum value in (20) can be obtained with help of the software Maple (command maximize). We present the values of m(ε)and the corresponding lower and upper bounds in the following table:

ε u(ε)₌ε/4 m(ε) ℓ(ε)₌ε/4−2ε2

1/10 0.2500000000 e-1 0.2381016622 e-1 0.0500000000 e-1 1/100 0.2500000000 e-2 0.2487562438 e-2 0.2300000000 e-2 1/1000 0.2500000000 e-3 0.2498748126 e-3 0.2480000000 e-3 1/10000 0.2500000000 e-4 0.2499949982 e-4 0.2498000000 e-4

Table 1 – Comparison among upper bound, exact minimum value and lower bound for example 1.

Table 1 shows that forε >0 the upper boundu(ε)is always strict larger than the exact valuem(ε). We prove this fact rigorously in the proposition C.2.

Example 2: Consider the diagonal matrix C ₌_{0 0}

0 2

,n ₌ 2,k ₌ 1,r ₌0, s ₌ 2 and the same nonlinear function f(X) ₌ X₁2_,₁as in the example 1. In this case the minimization problem (2) is given by:

m(ε) ₌ min

X₁_,₁+X₂_,₂=1,

X₁_,₁,X₂_,₂≥0,

X₁2_,₂≤X1,1X2,2

2X₂_,₂₊εX₁2_,₁.

Since the variableX₁_,₂does not appear in the above objective function we can rewrite this minimization problem as:

m(ε) ₌ min

(14)

This shows that the example 2 of perturbed matrix minimization problem is not more than a perturbed LP.

In this example we have m(0) ₌ λ₁(C) ₌ 0, λ₂(C) ₌ 2; r ₌ 0, s ₌ 2; v₁T ₌ (1,0), v₂T ₌ (0,1); fˉ ₌ f(v₁v₁T) ₌ 1 and L ₌ √2 (see example 1). Thereforeu(ε)₌ε(see proposition 2.5) andℓ(ε)₌ε₋2ε2_{(see theorem 2.8).} On the other hand, we can easily compute the exact minimum value as:

m(ε)₌

(

ε if 0_≤ε_≤1,

2₋1/ε if 1< ε. (21)

On the contrary of example 1, note thatu(ε)₌m(ε)for all 0_≤ ε_≤ 1. That is, there is no error by replacing the upper bound by the exact minimum value if we require thatεbe small enough. This is a property of all Linear Programmings perturbed by a Lipschitz function (see section 4).

4 Note about LP perturbed by a Lipschtz function

In the context of Linear Programming (LP) the corresponding matrix minimiza-tion problem (2) is given by:

m_LP(ε)_:=min

( _n X

i=1

c_ix_i₊ε f(x)_:x _∈K

LP

)

(22)

for a fixedc_∈Rn. Here the domain of the objective function is:

K

LP :=

(

x _∈Rn:

n

X

i=1

xi =k and 1≥x1,x2, . . . ,xn ≥0

)

and f _:K

LP → Ris a Lipschitz function, that is, there is a 0 ≤ Lmax < ∞ such that:

|f(x)₋f(y)_{| ≤}L_max_kx₋y_k_max, where _kx₋y_k_max_:= max

i=1,...n|xi−yi| (23)

is the max-norm.

We define S(0) _:= argmin Pn_i₌₁c_ix_i: x ∈ K

LP , that is, the preimage of the valuem_LP(0)(the set of minimizers of the unperturbed problem), S(ε) _:= argmin P_in₌₁c_ix_i₊εf(x)_:x _∈K

LP ,Sf(0):=argmin{f(x): x ∈S(0)}and

ˉ

(15)

The authors [1] showed that for small enoughεthere is no error by replacing the upper bound,u_LP(ε)_:=m_LP(0)₊εfˉ, by the exact minimum value,m_LP(ε), namely:

m_LP(ε) ₌ m_LP(0)₊εfˉ for all 0_≤ε_≤ε_o, (24) whereε_ois strict positive and given by:

ε_o_:= αLP

Lmax

>0. (25)

This means thatS_f(0)_⊂S(ε)for all 0_≤ε_≤ε_o. In [1]α_LPcan be taken as the largest positive constant that satisfies the inequality

n

X

i=1

cixi −mLP(0)≥αLP min

y∈S(0)kx−ykmax (26)

for allx _∈ K

LP. It is interesting to compare (26) with (10) and (9). In fact, in [1] it is only proved thatα_LP >0. That is, in [1] there is no explicit formula for α_LPin terms ofc.

Remark 4.1. In this section we are using the max-norm in (23) and (26) in order to follow [1]. We can assure the strict positiveness ofε_ofor any choice of norm, since inRnall norms are equivalent.

In order to illustrate the result (24)-(25) of [1], note that minimization problem of example 2 is as in (22). To see this, we identifyx₁ ₌ X1,1 andx2 = X2,2. Hence,

m_LP(ε)₌ min

x₁+x₂=1

x₁,x₂≥0

2x2₊εx₁2. (27)

It is easy to show that for this example thatL_max₌2 andα_LP ₌2. So, by (25) εo=1. According to the exact solution, see (21),εo=1 is the best (the largest)

value for the equation (24) holds true.

Appendix A. Proof of proposition 2.3

(16)

Lemma A.1(Cauchy-Schwarz-Bunjakowski Inequality for the entries of pos-itive semidefinite matrices). If W _∈ M_n is a symmetric positive semidefinite matrix then its entries satisfy the following inequality:

Wi,j

2

≤ W_i_,_iW_j_,_j for all i, j ₌1,2, . . . ,n.

Proof. see [4] page 398.

Corollary A.2. Consider W _∈ Mnwith I W 0. In this case we have the

following implications:

a) If W_ℓ,ℓ ₌ 1for someℓ_{∈ {}1, . . . ,n_}then W_ℓ,_j ₌0for all1 _≤ j _≤ n ( j₆₌ℓ) and W_i_,ℓ₌0for all1_≤i_≤n (i ₆₌ℓ);

b) If W_ℓ,ℓ ₌ 0for someℓ_{∈ {}1, . . . ,n_}then W_ℓ,_j ₌0for all1 _≤ j _≤ n and W_i_,ℓ₌0for all1_≤i _≤n;

Proof. To prove a) we use that I W, so I ₋ W 0. It follows by lemma A.1 that

δi,j −Wi,j

2

≤ (1₋W_i_,_i)(1₋Wj,j). Hence, if W_ℓ,ℓ = 1

then W_ℓ,_j ₌ 0 for j ₆₌ ℓandW_i_,ℓ ₌ 0 fori ₆₌ ℓ. The proof of b) uses that

W 0 and the lemma A.1.

Proof of Proposition 2.3.

Proof. Since C is symmetric there is an orthonormal-basis of eigenvectors. We denote it by _{vi}ni=1 ⊂ Rn. Since this basis is orthonormal the trace Tr_{C X_}is expressed as

Tr_{C X_{} =}

n

X

i=1

v_iTC Xv_i ₌ n

X

i=1

λ_i(C) v_iTXv_i. (28)

We defineW_i_,_j(V,X) _:=v_iTXv_j fori, j =1,2, . . . ,n. That is, the matrixW is given byW _:= VT_{X V}_{. Recall that} _V _{is the matrix, whose columns are the}

eigenvectorsv_i. Note that for all X _∈K_{we have:}

0_≤W_i_,_i(V,X)_≤1 and

n

X

i=1

(17)

because I X 0 and Tr_{X_{} =} k respectively. Combining (28) with (29) we obtain:

min Tr{X}=k:

IX0

Tr_{C X_{} ≥} min x∈C

n

X

i=1

λ_i(C)x_i, (30)

where the setC _:=_x ₌(x1,x2, . . . ,xn)∈ [0,1]n: x1+x2+ ∙ ∙ ∙ +xn = k

is convex and compact.

We claim that we have equality in (30). To see this, lety ∈C_{be a minimizer} of theright hand side(RHS) of (30). Further, take X(y)_:= V D(y)VT, where D(y)is the diagonal matrix which the diagonal entries are the yi’s andV is the

matrix which columns are the eigenvectorsvi. We will show thatX(y)∈Kand

Tr{C X(y)_}is equal the minimum value of the RHS of (30). Indeed, X(y)_∈K since Tr_{V D(y)VT

} =Tr_{D(y)_{} =}Pn

i=1yi =kandI V D(y)VT 0 since

1_≥ yi ≥0. Moreover, Tr{C X(y)} =Tr{C V D(y)VT} =Tr{VTC V D(y)}(the

trace is cyclic). On the other hand, VT_{C V}

= D(λ) (spectral decomposition ofC). Hence, Tr_{C X(y)_{} =} Tr_{D(λ)D(y)_{} =} P

i=1λi(C)yi, which is the

minimum value of the RHS of (30). Therefore,

min x∈C

n

X

i=1

λ_i(C)x_i ₌X

i=1

λ_i(C)yi =Tr{C X(y)} ≥ min

Tr{X}=k: IX0

Tr_{C X_}. (31)

Comparing (30) with (31) we obtain the claim, that is,

min Tr{X}=k:

IX0

Tr_{C X_{} =} min x∈C

n

X

i=1

λ_i(C)x_i. (32)

The fundamental theorem of linear optimization (FTLP, see [2]) states that

min x∈C

n

X

i=1

λ_i(C)x_i ₌ min x∈Ext(C)

n

X

i=1

λ_i(C)x_i. (33)

Here Ext(C₎_{is the set of the extreme points of}C_{. It is well-known that}

Ext(C) ₌

x₌(x1,x2, . . . ,xn)∈ {0,1}n|x1+x2+ ∙ ∙ ∙ +xn = k . (34)

Note that #Ext(C₎₌ n k

Hence:

min x∈Ext(C)

n

X

i=1

λ_i(C)x_i ₌ min

(18)

The minimization problem of the right side of (35) is easy to solve. More precisely:

min

1≤i₁<i₂<∙∙∙<i_k≤nλi1(C)+λi2(C)+ ∙ ∙ ∙λik(C) =λ₁(C)₊λ₂(C)_{+ ∙ ∙ ∙ +}λ_k(C).

(36)

In the last equality we used the fact thatλ₁(C) _≤ λ₂(C) _{≤ ∙ ∙ ∙ ≤} λ_n(C) (in-creasing order).

Combining (32), (33), (35) and (36) follows thatλ₁(C)_{+ ∙ ∙ ∙ +}λ_k(C)is the minimum value of the problem (1).

Next, we will characterize the set of minimizers of the problem (1). We denote it byX

∗(0).

According to the definitions ofr(C,k),s(C,k)andd(C,k)(see definition 2.2) the eigenvalues ofCare ordered as

λ₁ _≤ . . . _≤ λ_r < λ_r₊₁ ₌ λ_r+₂ _{= ∙ ∙ ∙ =} λr+d < λs ≤ ∙ ∙ ∙ ≤ λn.

The FTLP states also that

argmin

( _n X

i=1

λi(C)xi:x∈C

)

= convex hull of Ext_⋆(C_{) ,} ₍₃₇₎

where

Ext_⋆(C₎_:=

   

  

x₌(x1,x2, . . . ,xn)∈ {0,1}n:

x1=x2= ∙ ∙ ∙ =xr =1, xs =xs+1= ∙ ∙ ∙ =xn=0

and

x₁₊x₂_{+ ∙ ∙ ∙ +}x_n₌k

   

  

. (38)

Note that #Ext_∗(C₎₌ d k−r

. The set of the RHS of (37) is the smallest convex subset ofC _{that contains Ext}

∗(C). Recalling (38) and the factn = r +d+s,

it follows that the convex hull of Ext_∗(C₎_{is clearly given by:}

(

(1,_{∙ ∙ ∙} ,1,t_r₊₁,t_r₊₂,_{∙ ∙ ∙} ,t_r₊_d,0,_{∙ ∙ ∙}0)_{∈ [}0,1_]n_:

r+d

X

i=r+1

t_i ₌k₋r

)

. (39)

We define the following three sets: 1)K_d _{:= {}_Z _∈ _M_d_:_Tr_{_Z_{} =}_k₋_r _and

(19)

2)

K r,d,s :=

    

   

{I_r _⊕Z_⊕0_: Z _∈K

d} if r ≥1 and s ≤n,

{Z _⊕0_: Z _∈K

d} if r =0 and s ≤n,

{I_r _⊕Z_: Z _∈K

d} if r ≥1 and s =n+1,

K_d if r ₌0 and s ₌n₊1,

and 3)VK r,d,sV

T

:= {V W VT: W ∈ K

r,d,s}. We mean by Ir ⊕ Z ⊕0 the

n_×nmatrix in the block form:





Ir | 0 | 0 − − − − − −

0 | Z | 0

− − − − − −

0 | 0 | 0





In this block, I_r is ther _×r identity matrix, Z is ad_×d matrix and, by 0 in the diagonal, we mean the(n₋s₊1)_×(n₋s₊1)zero-matrix.

We claim that if X_∗ is a minimizer of problem (2) then the corresponding matrixW_∗₌W(V,X_∗)_:=VT_X

∗V is an element ofKr,d,s. To see this, note that

if X_∗ is a minimizer then by (28) one has Tr_{C X_∗_{} =} Pn

i=1λi(C)W∗i,i and by

(37) and (39) the vectorW_∗₁_,₁,W_∗₂_,₂, . . . ,W_∗n_,_nmust satisfy: a) W_∗₁_,₁₌W_∗₂_,₂_{= ∙ ∙ ∙ =}W_∗r_,_r ₌ 1_; b) W_∗s_,_s ₌W_∗s₊₁_,_s₊₁_{= ∙ ∙ ∙ =}W_∗n_,_n₌0_; c) Pn

i=1W∗i,i =k.

(40)

Now, combining (40) a) and (40) b) with the corollary A.2 we conclude that:

a) For any i₌1,2, . . . ,r and j₌1,2, . . . ,n we haveW_∗i_,_j ₌δ_i_,_j,

a’) For any j ₌1,2, . . . ,r andi ₌1,2, . . . ,nwe haveW_∗i_,_j ₌δ_i_,_j, b) For anyi₌s,s₊1, . . . ,nand j₌1,2, . . . ,nwe haveW_∗i_,_j ₌0 and b’) For any j ₌s,s₊1, . . . ,nandi₌1,2, . . . ,nwe also haveW_∗i_,_j ₌0.

This proves that then_×nmatrixW_∗is of the form





Ir| 0 | 0 − − − − − −

0| ⋆ | 0

− − − − − −

0| 0 | 0



(20)

Here⋆is ad_×d(note thatd ₌s₋r₋1) submatrix which satisfiesI ⋆0 becauseW_∗ ₌ VT_X

∗V and X∗ satisfies also I X 0. Due to the (40) c)

one has Tr_{⋆_{} =} k. Therefore W_∗ _∈ K

r,d,s. Since VT = V−1 (because V is

orthogonal) we obtain thatX

∗(0)⊂VKr,d,sV T_.

On the other hand, the setVK

r,d,sVT is a subset ofX∗(0)because ifZ ∈Kd

then

TrC V(I_r _⊕Z_⊕0)VT ₌Tr_{3 (I_r _⊕Z _⊕0)_} since Tr is cyclic andC ₌V3VT

=

r

X

i=1

λ_i1₊

d

X

i=1

λ_r₊_i Z_r₊_i_,_r₊_i ₌

r

X

i=1

λ_i ₊λ_k d

X

i=1

Z_r₊_i_,_r₊_i

since λ_r₊₁_{= ∙ ∙ ∙}λ_r₊_d ₌λ_k ₌ r

X

i=1

λ_i ₊(k₋r)λ_k

since Tr_{Z_{} =}k₋r ₌

k

X

i=1

λ_i

ThereforeVK r,d,sV

T

⊂X ∗(0).

In the particular cased ₌ 1 (_⇒r ₊1 ₌ k) we have Z _{= [}1_]. Hence, the only element of the setX

∗(0)is X =V(Ir ⊕ [1] ⊕0)VT =V(Ir+1⊕0)VT = V(I_k _⊕0)VT ₌Pk

i=1viv T

i . This is a orthogonal projection of rankk.

Recall that if the matrix C has degenerate eigenvalues then there are many choices for the orthogonal matrixV. We prove that the set of minimizers does not depend on the choice ofV. Consider the eigenvalues ofCwithout counting the multiplicity, that is, μ1(C) < μ2(C) < ∙ ∙ ∙ < μp(C). We denote by ν1 the multiplicity of μ₁(C), byν₂ is the multiplicity of μ₂(C)and so on. The corresponding eigenspacesE_i _⊂Rn_,_i

=1, . . . ,pare pairwise orthogonal and dimEi = νi. Note that

Pp

i=1νi = n. The many choices for V comes from

the fact that there are many orthogonal basis for each eigenspace. However, two different orthogonal basis of an eigenspace are related to each other by an orthogonal matrix. Suppose thatV˜ were other choice, thenV andV˜ are related by V˜ ₌ V(U₁ _⊕U₂ _{⊕ ∙ ∙ ∙ ⊕}U_p) where U_i is a ν_i _×ν_i orthogonal matrix. We claim that the solution sets X_∗(0) _{:= {}V(Ir ⊕Z ⊕0)VT: Z ∈ Kd}and

˜

X_∗(0) _{:= { ˜}V(I_r _⊕ Z _⊕0)V˜T

: Z _∈ K

(21)

˜

V(Ir ⊕ Z⊕0)V˜T =V(U1⊕ ∙ ∙ ∙ ⊕Up)(Ir ⊕Z⊕0)(U1⊕ ∙ ∙ ∙ ⊕Up) T _VT

= V(I˜_r _{⊕ ˜}Z _⊕0)VT _where _Z_˜

= (U₁_{⊕ ∙ ∙ ∙ ⊕}Up)Z(U₁⊕ ∙ ∙ ∙ ⊕Up)T and I˜r = (U₁_{⊕ ∙ ∙ ∙ ⊕}Up)Ir(U1⊕ ∙ ∙ ∙ ⊕Up)T. But it easy to see thatI˜r =Ir. Therefore,

˜

V(I_r _⊕Z _⊕0)V˜T ₌V(I_r _{⊕ ˜}Z _⊕0)VT and since Z _∈ Kd ⇔ ˜Z ∈Kd follows

the claim.

5 Proof of Lemma B.1

Lemma B.1. Let C _∈ M_n be a symmetric matrix. For each k ₌ 1,2, . . . ,n, the following strict positive constantα(C,k)defined by

α(C,k)_:= 1 2 min

λ_k(C)₋λ_r(C)

s₋(k₊1) ,

λ_s(C)₋λ_k(C)

k

(41)

satisfies the inequality

Tr_{C X_{} −}

k

X

i=1

λ_i(C) _≥ α(C,k) min

Y∈X ∗(0)

kX₋Y_k2 (42) for all X _∈ K_{. In} ₍₄₁₎_{the values r} ₌ _r₍_C_,_k₎_{and s} ₌ _s₍_C_,_k₎_{are given by}

definition2.2. Moreover, we do the convention that λ₀(C) _{:= −∞}in the case r ₌0andλ_n₊₁(C)_{= +∞}in the case s ₌n₊1. In(42)we denote byX

∗(0)

the setargmin{Tr{C X}: X ∈K_}_.

The proof of the lemma B.1 is based on the three propositions below. More precisely, the propositions B.2, B.3 and B.6.

Note that the matrix function Tr{ ∙ }(consequently alsok ∙ k2_{) is invariant by} conjugation of an orthogonal matrix, that is, Tr_{S AST

} =Tr_{A_}for all A_∈ M_n andSorthogonal. Due to this fact, we can reduce the proof of (42) to the case thatC is diagonal. More precisely, we have:

Proposition B.2. Let C ₌ V3VT be a spectral decomposition of the sym-metric matrix C, with diagonal matrix 3 _:= diag( λ₁(C), . . . , λ_n(C) ) and a corresponding orthogonal eigenvector matrix V . If X _∈ K _{then for W} ₌

W(X)_:= VTX V holds:

Tr_{C X_{} −}

k

X

i=1

λ_i(C) ₌ Tr_{3W_{} −}

k

X

i=1

(22)

and

min

Y∈X ∗(0)

kX₋Y_k2 _≤ min

Z∈K

d

kW₋I_r _⊕Z_⊕0_k2 . (44) HereK

d :=

Z _∈ M_d_:Tr_{Z_{} =}k₋r and I Z 0 .

Proof. Since the trace is orthogonal-invariant we have Tr_{C X_{} =}Tr

V3VTV W VT ₌Tr_{3W_}.

This proves (43). To prove (44) takeZ_∗∗ _∈argmin_{kW₋I_r_⊕Z_⊕0_k2

: Z _∈K d}.

Due to the proposition 2.3 Y_∗∗ _:= V(I_r _⊕ Z_∗∗ _⊕0)VT _∈ X

∗(0). From the

orthogonal invariance of_{k ∙ k}2_{follows that the} min

Y∈X ∗(0)

kX ₋Y_k2 _{≤ k}X₋Y_∗∗_k2 _{= k}V W VT ₋V(I_r _⊕Z_∗∗_⊕0)VT_k2 = kW ₋I_r_⊕Z_∗∗_⊕0_k2₌ min

Z∈K

d

kW₋I_r _⊕Z_⊕0_k2.

This proves (44).

In order to prove (42) we establish a lower bound for Tr_{3W_{} −}P

i=1λi(C)

(see proposition B.3) and an upper for min_Z∈K

d

kW₋I_r _⊕Z_⊕0_k2 _(see proposition B.6). Namely:

Proposition B.3. Let be n≥2. For all W ∈K_{we have}

Tr_{3W_{} −}

k

X

i=1

λ_i(C) _≥

λ_k(C)₋λ_r(C)

r₋

r

X

i=1 W_i_,_i

!

+λ_s(C)₋λ_k(C) n

X

i=s

W_i_,_i.

(45)

In the cases r ₌0and s₌n₊1we have

Tr_{3W_{} −}

k

X

i=1

λ_i(C) _≥ λ_s(C)₋λ_k(C) n

X

i=s

W_i_,_i. (46)

and

Tr_{3W_{} −}

k

X

i=1

λ_i(C) _≥

λ_k(C)₋λ_r(C) r₋

r

X

i=1 W_i_,_i

(47)

(23)

Proof. Recall thatW _∈ K_{means that Tr}_{_W_{} =} _k _and_I _W _0. Conse-quentlyW_i_,_i ₋1 _≤ 0, Wi,i ≥ 0 andPn_i₌₁W_i_,_i = k. In order to simplify the

notation we identifyW_i_,_i withw_i andλ_i(C)withλ_i. Note that

Tr_{3W_{} −}

k

X

i=1

λ_i ₌ n

X

i=1

λ_iw_i ₋ k

X

i=1

λ_i(C)1₌

k

X

i=1

λ_i(w_i ₋1)₊ n

X

i=k+1

λ_iw_i

=

r

X

i=1

λ_i(w_i ₋1)₊ k

X

i=r+1

λ_i(w_i ₋1)₊ s−1

X

i=k+1

λ_iw_i ₊ n

X

i=s λ_iw_i

From the following observations:

a) Combiningw_i ₋1 _≤ 0 fori ₌ 1, . . . ,k withλ_r _≥ λ_i fori ₌ 1, . . . ,r andλ_k _≤λ_j for j ₌r₊1, . . . ,k;

b) Combiningw_i _≥ 0 fori ₌ s, . . . ,n withλ_s _≤ λ_i fori ₌ s, . . . ,n and λ_k _≥λ_j for j ₌r ₊1, . . . ,k;

c) λ_k ₌λ_r₊_i fori ₌1, . . . ,d andλ_k ₌λ_k₊_j for j ₌1, . . . ,s₋1;

d) Pk

i=r+1wi+

Ps−1

i=k+1wi = k−

Pr

i=1wi−

Pn

i=swi since

Pn

i=1wi =k

, we conclude that

Tr_{3W_{} −}

k

X

i=1

λ_i _≥λ_r₋r ₊

r

X

i=1

w_i

+λ_k

−(k₋r)₊ k

X

i=r+1

w_i

+λ_k₊₁ s−1

X

i=k+1

w_i ₊λ_s n

X

i=s wi

= −λ_r

r ₋

r

X

i=1

w_i

+λ_k

r ₋k₊

k

X

i=r+1

w_i ₊ s−1

X

i=k+1

w_i

+λ_s n

X

i=s w_i

= −λ_r

r ₋

r

X

i=1

w_i

+λ_k

r₋

r

X

i=1

w_i₋ n

X

i=s w_i

+λ_s n

X

i=s w_i

=λk−λr r − r

X

i=1

w_i

+λ_s ₋λ_k

_Xn

s=1

w_i,

which proves the proposition B.3.

In order to establish an upper bound for min_Z_∈K

d kW−Ir⊕Z⊕0k

(24)

Lemma B.4. For any positive semidefinite matrix A_∈ M_pthe inequality kA_k2_:= Tr_{A2_{} ≤} Tr_{A_}2

=: Tr2_{A_} (48) holds.

Proof. To prove (48) note that ifA0 then 0≤λ_i(A). Hence, kA_k2₌

p

X

i=1

λ2_i(A)_≤ p

X

i=1

λ2_i(A)₊2

p

X

i<j

λ_i(A) λ_j(A)

=

p

X

i

λ_i(A)

!2

=Tr2_{A_}.

(49)

Lemma B.5. For any symmetric matrix A_∈ M_dwith I A0we have the following inequality

min

kA₋Z_k2_: Z _∈ M_d with Tr_{Z_{} =}k₋r and I Z 0

≤ (k−r)₋Tr{A}2 (50)

Proof. Case 1: A₆₌0. Since A0 we have in this case that Tr_{A_}>0. So we can take Z_o_:= _Trk−_{_Ar_} Aas a trial element of the domain of the minimization problem (50). Note thatZosatisfiesI Z 0, sinceAalso does, and satisfies

Tr_{Z_o_{} =} k ₋r. Hence, the Left Hand Side, LHS, of (50) has the following upper bound:

LHS _{≤ k}A₋Z_o_k2₌

1₋ k−r Tr_{A_}

2

kA_k2 = (k₋r) ₋Tr_{A_}2 kAk

2 Tr2_{A_}.

(51)

On the other hand, since A 0 we have by (48) that_kA_k2/Tr2_{A_{} ≤} 1. This closes the proof in the first case.

Case 2: A₌ 0. In this case Tr_{A_{} =} 0. Using (48) again, we have_kZ_k2 ≤ Tr2{Z}. Moreover, Tr2{Z} =(k−r)2₌ (k−r)₋02= (k−r)₋Tr{A}2

. Hence, LHS_≤ (k₋r)₋Tr_{A_}2