2 Preliminaries - LIPIcs – Leibniz International Proceedings in Informatics

Note:

All through the paper we will be bounding errors whp as _m_Ω(1)¹ . Note that these errors are less thanifmis chosen to be a sufficiently a large polynomial of ¹. Think of whp to mean with probability 1−_m_Ω(1)¹ .

All through the paper we leave the errors in terms ofm, think of adding up all the errors and union bounding the probabilities and fixing all the parameters in terms ofand then we choose a sufficiently largem= _Ω(1)¹ to make the sum of all the errorsO() and the union of all the error probabilitiesO().

ILemma 6. For any degree 2 multilinear polynomialp= P

i,j∈[n]

aijxixj+ P

l∈[n]

blxl+C, we have the following bound:

| E

x∼{±1}ⁿsgn(p(x))− E

x∼Nⁿ(0,1)

sgn(p(x))| ≤O

i=1

Inf_i²(p) (V ar[p])²

#¹₉ .

where theith influence ofpis defined as Infi(p) =E|_∂x^∂p

i|²= 2 P

j∈[n]

a²_ij+b²_i. Think ofith influence as the variance ofpalong theith coordinate.

Observe thatV ar[p]≤

i=1

Infi(p)≤2V ar[p]. Now we define the notion ofregularity for polynomials which essentially means that there is no single variable whose influence is very large as compared to the rest of the variables.

IDefinition 7. We say that the polynomialpisτ-regular if max

i∈[n]Infi(p)≤τ V ar[p].

Thus for a τ-regular polynomial p we can bound the replacement error above as O(τ¹⁹) because

i=1

Inf_i²(p) (V ar[p])² ≤

[max

i Inf_i(p)]

i=1

Inf_i(p) (V ar[p])² ≤2τ.

Note that when we apply this, we pick τ=^O(1). Regularity Lemma

We will use the following Regularity Lemma from [6]:

I Lemma 8. Every multilinear degree 2 polynomial p: {±1}ⁿ → R can be written as a decision tree of depthD= ¹_τ ·O

log_{τ θ}¹^O(1)

such that with probability(1−θ)over a random leaf the resulting polynomialp_α is either

(i) τ regular, OR (ii) V ar(p_α)< θ||p||²₂.

Note that when we apply this Regularity lemma we will chooseθ=^√¹_m, τ =^O(1) so that D= (logm)^O(1). After all the parameters are fixed we finally pickm= _Ω(1)¹ large enough so that all the errors get bounded by O().

2.2 Eigenvalues of polynomials, Central Limit Theorem.

Eigenvalues

Letp:Rⁿ→Rbe a multilinear polynomial of degree 2. Thus there exist a real symmetric matrixA, a vectorB^t and a constantC such that

p(x) =x^tAx+B^tx+C.

The eigenvalues ofpare defined to be the eigenvaluesλ1, . . . , λn of the real symmetric matrixA. Sincepis a multilinear polynomial we have

i=1

λ_i= 0.

We have the following expression for variance of the polynomial from [3]:

V ar[p] =

i=1

(b²_i + 2a²_ii) + X

1≤i<j≤n

a²_ij.

The eigenvalues capture a lot of information about the polynomial. For instance if all the eigenvalues are small then the polynomial behaves like a single Gaussian. Let’s define this notion of regular polynomials.

IDefinition 9. If all the eigenvalues of a polynomialpare small relative to it’s variance, that is|λmax(p)| ≤p

V ar[p], then it is called an-regular polynomial.

Central Limit theorem

We would need the following Central Limit Theorem from [3](Lemma 31 in their paper).

It essentially says that if all the eigenvalues of a degree 2 polynomialpare small then the polynomial can be well approximated with a single Gaussian which has the same mean and variance. That is,

ILemma 10. Letp:Rⁿ→Rbe a degree-2polynomial over independent standard Gaussians.

If |λmax(p)| ≤ p

V ar[p], then p is O()-close to the Gaussian N(E[p], V ar[p]) in total variation distance(hence also in Kolmogorov distance).

2.3 Definition of L and basic facts.

We define L as follows: L is determined by a hash functionh: [n]→[m] and a sign function σ: [n]→ {±1} as follows:

L(y)i=σ(i)yh(i).

Note that for eachi∈[n],his uniformly random on [m] and σis±1 uniformly at random.

h, σare chosen from 8-wise independent families. Thus Lcan be represented by a n×m matrix where theith row ofLisci=σ(i)eh(i)whereej is thejth standard basis vector of R^m. It is depicted in the following figure:

L =

m c_i

Figure 1Construction ofL

Note that the rows ofLsatisfy the following properties:

EL[hci, cji¹] =EL[hci, cji³] =δij. EL[hci, c_ji²] =

(1, if i=j.

1 m, else.

Note that this is a standard Johnson-Lindenstrauss matrix. In the following Lemma we show that they preserveL² norms and inner products of vectors to give a feel for the kind of computations we need. In factL^tpreserves a lot more structure as we shall see in the next section.

ILemma 11. For anyn, >0, there exists anm=poly(¹)and an explicit family of Linear transformationsL^t(with seed lengthO(logn)from{±1}^m→ {±1}ⁿ)so that for any two unit vectorsv₁, v₂∈Rⁿ we have

|hL^tv1, L^tv2i−hv1, v2i|< wp1−2 overL.

Proof. We know thatL^tv1=

i=1

vⁱ₁ci, L^tv2=

j=1

v₂^jcj. Thus we have

hL^tv₁, L^tv₂i=h

i=1

vⁱ₁c_i,

j=1

v^j₂c_ji= X

i,j∈[n]

v₁ⁱv₂^jhc_i, c_ji hL^tv₁, L^tv₂i−hv1, v₂i= X

i6=j∈[n]

v₁ⁱv₂^jhci, c_ji (hL^tv1, L^tv2i−hv1, v2i)²= X

i₁6=j₁ i₂6=j₂

v₁ⁱ¹vⁱ₁²v₂^j¹v^j₂²hci₁, cj₁ihci₂, cj₂i.

Note that when averaged wrtEL, the only terms that survive are those that are paired either as (i₁=i₂, j₁=j₂) or (i₁=j₂, i₂=j₁).

The rest of the terms average to 0 because of the signσ, that isEσ[σ(i1)σ(i2)σ(j1)σ(j2)]

only survives if the indices are paired and we already have the constraintsi₁6=j₁, i₂6=j₂. Thus we have

EL(hL^tv1, L^tv2i−hv1, v2i)²=X

i6=j

(v₁ⁱ)²(v₂^j)²ELhci, cji²+X

i6=j

v₁ⁱv₂ⁱv₁^jv₂^jELhci, cji²

= 1 m

i6=j

[(vⁱ₁)²(v^j₂)²+vⁱ₁vⁱ₂v^j₁v^j₂]

≤ 1

m(|v1|²₂|v2|²₂+hv1, v2i²)≤ 2 m Thus using Chebyshev’s inequality we have

|hL^tv₁, L^tv₂i−hv1, v₂i| ≤ 1

m^1/3 wp 1− 2

m^1/3

overL.

Now we choose m=¹3 to have

|hL^tv1, L^tv2i−hv1, v2i| ≤wp1−2 overL.

This completes the proof. J

To see that norms are preserved too just choosev₁=v₂above.

Note

All through the paper we will be computing such expected moments and bounding them by

m^Ω(1) and then use Markov|Chebyshev’s inequality (We can’t use big moments becauseL has limited independence). Think of these errors as small because after all the parameters are fixed we pickm= _Ω(1)¹ , to be a sufficiently large polynomial of ¹ to bound all the terms by O(). We showed the constants explicitly in the above Lemma but we would not be computing them exactly later on and just denote them withO(1).

2.4 Technical Lemmas involving L

We show that the transformationp→pLdoesn’t change the variance by a lot. Ifp(x) = x^tAx+B^tx+C then pL(y) = y^t(L^tAL)y+ (B^tL)y+C. Note that this is just a basic moment computation and doesn’t involve anything non trivial.

ILemma 12. If p(x) =x^tAx+B^tx+C is a multilinear polynomial, pL(y) =y^t(L^tAL)y+ (B^tL)y+C. Then,

ELV ar[pL] =

i=1

b²_i + 1 + 3

|A|²_F =V ar[p] + 3 m|A|²_F Proof. We know that

V ar[p] =

i=1

(b²_i +a²_ii) +||A||²_F =

i=1

b²_i +||A||²_F. Let’s compute the same forpL. Note thatL^tAL= P

i,j∈[n]

aijci⊗cj. Thus,

|L^tAL|²_F = X

i₁,j₁,i₂,j₂∈[n]

ai₁,j₁ai₂,j₂hci₁, ci₂ihcj₁, cj₂i

= X

i₁6=j₁ i₂6=j₂

σ(i1)σ(i2)σ(j1)σ(j2)ai₁,j₁ai₂,j₂I{h(i1)=h(i2), h(j1)=h(j2)}.

Let’s take expectation overσ. We know thatEσ[σ(i1)σ(i2)σ(j1)σ(j2)]6= 0 iff (i1, j1) = (i2, j2) or (i₁, j₁) = (j₂, i₂).

LetT1 denote the terms of the first kind, then we have T1= P

i₁,j₁

a²_i₁_,j₁ =|A|²_F. LetT2

denote the terms of the second kind, then we haveT2= P

i₁,j₁

a²_i

1,j₁I{h(i1)=h(j1)}and thus EL[T2] = P

i₁,j₁

a²_i₁_,j₁_m¹ =_m¹|A|²_F. Also

hB^tL, B^tLi= X

i₁,i₂∈[n]

b_i₁b_i₂hci1, c_i₂i=X

i₁,i₂

σ(i₁)σ(i₂)b_i₁b_i₂I{h(i1) =h(i₂)}

EσhB^tL, B^tLi=X

b²_i.

We now compute P

l∈[m]

(L^tAL)²_ll.

l∈[m]

(L^tAL)²_ll=

l=1

i,j∈[n]

aijc^l_ic^l_j2

= X

i1,i2,j1,j2

ai₁j₁ai₂j₂ m

l=1

c^l_i

1c^l_i

2c^l_j

1c^l_j

= X

i16=j1

i26=j2

σ(i₁)σ(i₂)σ(j₁)σ(j₂)a_i₁_,j₁a_i₂_,j₂I{h(i1)=h(i₂)=h(j₁)=h(j₂)}

Let’s take expectation overσ. We know thatEσ[σ(i₁)σ(i₂)σ(j₁)σ(j₂)]6= 0 iff (i1, j₁) = (i2, j2) or (i1, j1) = (j2, i2).

Eσ

l∈[m]

(L^tAL)²_ll= 2X

i₁,j₁

a²_i

1j₁I{h(i1) =h(j₁)}

Thus, EL

l∈[m]

(L^tAL)²_ll= 2

m|A|²_F. J

In the following Lemma we prove bounds onV ar_L[V ar_y[pL]]. This would help us show thatV ary[pL] = Θ(V ar[p]) whp.

ILemma 13.

V arL[V ary[pL]] = O(1) m . Proof. From Lemma 12 we have

V ary[pL] =|L^tAL|²_F +|B^tL|²₂+

l=1

(L^tAL)²_ll

= X

i₁,i₂,j₁,j₂

ai₁j₁ai₂j₂hci₁, ci₂ihcj₁, cj₂i+X

r₁,r₂

br₁br₂hcr₁, cr₂i+

l=1

(L^tAL)²_ll

where X

l∈[m]

(L^tAL)²_ll= X

i16=j1

i₂6=j₂

σ(i1)σ(i2)σ(j1)σ(j2)ai₁,j₁ai₂,j₂I{h(i1)=h(i2)=h(j1)=h(j2)}

Thus we have

V ar_y[pL]−EL[V ar_y[pL]] = X

(i₁,j₁)6=(i₂,j₂)

a_i₁_j₁a_i₂_j₂hc_i₁, c_i₂ihc_j₁, c_j₂i+ X

r₁6=r2

b_r₁b_r₂hc_r₁, c_r₂i

l=1

(L^tAL)²_ll− 3 m|A|²_F.

We skip showing the elaborate yet simple moment calculations but observe that when squared and averaged overLeach term above will have atleast a _m¹ term in it. Also the corresponding coefficients can be bounded using Cauchy Schwarz and noting that|B|²₂≤1 and|A|²_F ≤1.

Thus

V ar_y[pL]−EL[V ar_y[pL]]2

=OV ar²[p]

. J

Now we put together these two Lemmas to show that V ar_y[pL] = Θ(V ar[p]) whp. We exclude the proof as it is a direct consequence of Chebyshev inequality using Lemma 12 and Lemma 13.

ILemma 14.

|V ar_y[pL]−V ar[p]| ≤OV ar[p]

m^1/3

wp 1− 1

m^1/3

overL.

The following lemma would also be useful. Intuitively it means thatLwould not perturb an eigenvalue ofAby a huge amount whp. In fact this would imply that all the eigenvalues ofAwould be in thepseudospectrum ofL^tAL.

ILemma 15. Let λbe an eigenvalue of A and let the unit vectorv be the corresponding eigenvector. Then we have

EL|(L^tAL−λIm×m)L^tv|²₂=O1 m

. Proof. SubstitutingAv=λv, we have

(L^tAL−λI_m×m)L^tv=L^tALL^tv−L^tAv.

Expanding the productL^tALL^tv we have, L^tALL^tv= X

i,j,k∈[n]

a_i,jc_ihc_j, c_kiv_k L^tALL^tv−L^tAv=X

i j6=k

ai,jcihcj, ckivk

Thus

|L^tALL^tv−L^tAv|²₂= X

i₁,i₂ j₁6=k1

j₂6=k2

a_i₁_,j₁a_i₂_,j₂v_k₁v_k₂hc_i₁, c_i₂ihc_j₁, c_k₁ihc_j₂, c_k₂i.

A term survives Eσ only if all the indices {i1, i2, j1, j2, k1, k2} are paired appropriately.

However when we takeEhsince we havej₁6=k₁, j₂6=k₂ we would see atleast a 1/min every term. Now the corresponding coefficient can be bounded using Cauchy Schwarz and noting that|v|²₂= 1 and|A|²_F ≤1. Thus we have

EL|(L^tAL−λI_m×m)L^tv|²₂=O(1)

m . J

No documento LIPIcs – Leibniz International Proceedings in Informatics (páginas 41-47)