2 Sparse recovery under noise - 9th Innovations in Theoretical Computer Science

A. Bhaskara and S. Lattanzi 7:5

of the algorithm with the best k-column approximation, as opposed to the best rank-k approximation. Second, our analysis also applies to thegeneralized CSS problem, where we have two matrices, and we approximate one using the columns of the other. Third, as mentioned above, our algorithm has weaker guarantees than the prior work when the matrix B has avery low error((1/k)kBk₁)`₁ approximation.

7:6 L1 Sparse Regression and Column Selection

Algorithm 1Warm-KL

Initialize q⁽⁰⁾ = v for an arbitrary v ∈ V, S = ∅, η = δ²/2k, and T = d^log

1 4√

+2δ

/log

1 1−η

e fort= 1. . . T do

Find the columnu∈V that minimizes Φ((1−η)q^(t−1)+ηu), and add it to S.

Setq^(t)←(1−η)q^(t−1)+ηu.

end for

ILemma 5(Potential drop). Consider the execution of the algorithm, and supposeΦ(q^(t−1))≥ 4(+ 2δ). Then we have

Φ(q^(t))≤ 1−η

Φ(q^(t−1)).

We first show how to prove Theorem 3 using Lemma 5, then in the next section we focus on the proof of the lemma.

Proof of Theorem 3. Note that Φ(q^(t)) = P

ipilog _2p

p_i+q_i

≤ P

ipiln 2 ≤ 1. So after T = d^log

1 4√

+2δ

/log

1 1−η

e steps, we have Φ(q^(T⁾) ≤ 4(+ 2δ). Now, using Pinsker’s inequality, we have that

p−p+q^(T) 2

₁

≤ q

2Φ(q^(T⁾)≤2p

2(+ 2δ).

This then implies that

p−q^(T⁾ ₁≤4p

2(+ 2δ), completing the proof of the theorem. J

2.1 Analyzing the potential drop

As shown in the previous subsection the key is to analyze the drop in potential. Let us fix somet, and for convenience, writeq=q^(t−1), andq⁰ =q^(t). We haveq⁰= (1−η)q+ηu, for someu∈V. Then, we have

Φ(q)−Φ(q⁰) =X

p_ilog

p_i+q⁰_i pi+qi

p_ilog

1 +η·u_i−q_i pi+qi

. (2)

Note that we wish to lower bound this potential drop. In other words, we need to prove that there exists au∈V such that the difference above is “large”. Since we know that there is a linear combinationP

iαivithat is-close top(in`1), the natural goal is to prove that one of thevi achieves the necessary potential drop, by an averaging argument. This is typically done by approximating the change in Φ by a linear function of theui’s. Ifη· ^u_pⁱ^−qⁱ

i+qi <1, this can be done using log(1 +x)≈x. But unfortunately in our case, the term ^u_pⁱ^−qⁱ

i+qi can be arbitrarily large, making such an approximation impossible.

To deal with this, we take advantage of additional structure. The first observation is the following.

ILemma 6. Let v_j, 1≤ j ≤k, be vectors in ∆_n such that p−P

jα_jv_j

₁ ≤. Then for all δ > 0, there exists a subset S^∗ of [k], such that

p−P

j∈S^∗α_jv_j

₁ ≤ δ+, and additionally, αj≥δ/k for all j∈S^∗. (I.e., the coefficients used are all “non-negligible”.)

A. Bhaskara and S. Lattanzi 7:7

IRemark. Even though the lemma is straightforward, it is the only place in the proof we use the “promise” that there exists ak-sparse approximation top.

Proof. The proof is simple: we considerP

jαjvj, and remove all the terms whose coefficients are < δ/k. As there are at most k terms, the total`1 norm of the removed terms is≤δ.

This implies that considering only the terms that remain has an error at most+δ. J The lemma shows that we can obtain a good approximation only using large coefficients.

As a consequence, we now show that we can restrict our attention to a truncated version of the vectorsV, which enables our analysis. Formally, define “truncated” vectors wj by setting

w_ji= min

v_ji,kp_i δ

I.e., the wj are the vectors vj, truncated so that no entry is more than k/δ times the corresponding entry inp. We start with a simple observation.

ILemma 7. For the vectorsvj, wj as above, we have

p− X

j∈S^∗

α_jw_j ₁

≤

p− X

j∈S^∗

α_jv_j ₁

Proof. Let us look at theith coordinate of the vectors on the LHS and RHS. The only way we can havev_ji6=w_ji (for somej) is whenv_ji> kp_i/δ, in which caseα_jv_ji> p_i. Thus in this coordinate,P

j∈S^∗αjvj has a value larger thanpi. Now, by moving toP

j∈S^∗αjwj, we decrease the value, but remain larger or equal top_i. Thus, the norm of the difference only

improves. J

Let us now go back to the drop in potential we wished to analyze (eq. (2)). Our aim is to prove that settingu=v_j for somej ∈S^∗ in the algorithm leads to a significant drop in the potential. We instead prove that settingu=wj(as opposed tovj) leads to a significant drop in the potential. Then, we can use the fact that the RHS of (2) is monotone inu(noting thatvji≥wji) to complete the proof.

Let us analyze the potential drop whenu=wj for somej ∈S^∗. Letη=δ²/2k. Write γ_i:= ^u_pⁱ^−qⁱ

i+qi. Then, we have γi= pi+ui

pi+qi

−1∈[−1, k/δ], aswji/pi≤k/δ forj∈S^∗.

Now, using the value of η, we have thatηγi∈(−η, δ/2). This allows us to use a linear approximation for the logarithmic term in Φ. Once we move to a linear approximation, we can use an averaging argument to show that there exists a choice ofuthat improves the potential substantially.

More formally, letI₊denote the set of indices withu_i≥q_i(i.e.,γ_i≥0), andI₋denote the other indices. Then, using that, forx >−1, log(1 +x)≥x(1−x) we have that for anyi∈I+, we have log(1 +ηγ_i)≥ηγ_i(1−^δ₂). Further, fori∈I₋, we have log(1 +ηγ_i)≥ηγ_i(1 +η).

Thus, in both the cases, we have that log(1 +ηγ_i)≥ηγ_i−δη|γi|

2 . (3)

Now before showing how to use to the inequality to conclude the proof, we use an averaging argument to prove the existence of a good columnj.

I T C S 2 0 1 8

7:8 L1 Sparse Regression and Column Selection

ILemma 8. There exists an indexj∈S^∗ such that settingu=wj in the algorithm gives X

piγi≥DKL

pk p+q 2

−(2+δ).

Proof. We start by recalling (via Lemmas 7 and 6), that

p− X

j∈S^∗

α_jw_j ₁

≤

p− X

j∈S^∗

α_jv_j ₁

≤(+δ).

For convenience, let us writer=P

j∈S^∗αjwj, and so kp−rk₁≤+δ. Now, we wish to analyze the behavior ofγ_i when we setu=w_j for differentj.

We start by definingτ_i^(j) as τ_i^(j):= wji−qi

p_i+q_i ,

i.e., the value ofγi obtained by settingu=wj. For convenience, writeZ =P

j∈S^∗αj. Now by linearity, we have

j∈S^∗

αj

piτ_i^(j)

j∈S^∗α_jw_ji)−Zq_i pi+qi

ri−Zqi

pi+qi

Thus, by averaging (specifically, the inequality that ifP

jαjXj ≥Y for αj ≥0, then there exists aj such thatX_j≥Y /(P

jα_j)), we have that there exists aj∈S^∗ such that X

piτ_i^(j)≥ 1 Z ·X

ri−Zqi

pi+qi

ri−qi

pi+qi

+ 1

Z −1 X

pi+qi

. (4)

The first term on the RHS can now be lower bounded as:

ri−qi

pi+qi

2pi

pi+qi

−1−pi−ri

pi+qi

≥X

2pi

pi+qi

−1

−X

|pi−ri|,

where we use the fact that|pi/(p_i+q_i)| ≤1. Thus, appealing to the inequality (x−1)≥logx, if we writeD:=DKL pk ^p+q₂

, then we have X

p_ir_i−q_i pi+qi

≥D−−δ, since kp−rk₁≤+δ.

To conclude the proof, let us consider the second term in the RHS of (4). If Z ≤1, the term is non-negative, and there is nothing to show. We next argue thatZ≤1 +, by showing that the sum ofαj over allj ∈ S can be bounded by 1 +. In fact, note that

j∈Sαjvj

−1≤ p−P

j∈Sαjvj

≤(triangle inequality). Furthermore forvj∈∆n, we have

jαjvj

₁=P

jαj. ThusP

j∈Sαj≤1 +, and thus

1− 1 Z

piri

pi+qi

≤ 1 +

ri≤.

Plugging this into (4) we can conclude the proof of the lemma. J We are now ready to complete the proof of the main lemma of this section – Lemma 5.

A. Bhaskara and S. Lattanzi 7:9

Proof of Lemma 5. Let D :=DKL pk ^p+q₂

. We can use Lemma 8 in conjunction with Eq. (3) to obtain that there exists aj such that setting u=wj gives us

p_ilog(1 +ηγ_i)≥X

ηp_iγ_i−δη 2

p_i|γi| ≥η(D−2−δ)−δη 2

p_i|γi|.

The last term on the RHS can be bounded, noting that X

p_i|γ_i| ≤X

p_i·|ui−qi| p_i+q_i ≤X

|u_i−q_i| ≤2, asui, pi, qi are all probability distributions, and _p^pⁱ

i+q_i ≤1. This implies that there exists a choice ofu=wj (and thus settingu=vj also works, as discussed earlier), such that the potential drop is at least η(D−2−δ)−ηδ. If D≥4(+δ), this is at least ηD/2. This

completes the proof of the lemma. J

No documento 9th Innovations in Theoretical Computer Science (páginas 144-148)