3 F 2 -sketching over the uniform distribution

IDefinition 7(Fourier dimension). TheFourier dimension off:Fⁿ₂ → {+1,−1} denoted as dim(f) is the smallest integerksuch that there existsA≤Fⁿ₂ of dimension kfor which Spec(f)⊆A.

We say that A ≤Fⁿ₂ is astandard subspace if it has a basis v1, . . . , vd where each vi has Hamming weight equal to 1. Anorthogonal subspace A^⊥ is defined as:

A^⊥={γ∈Fⁿ₂ :∀x∈A γ·x= 0}.

Anaffine subspace(or coset) ofFⁿ₂ of the formA=H+afor someH ≤Fⁿ₂ and a∈Fⁿ₂ is defined as:

A={γ∈Fⁿ₂ :∀x∈H^⊥ γ·x=a·x}.

We now introduce notation for restrictions of functions to affine subspaces.

IDefinition 8. Letf :Fⁿ₂ →Randz∈Fⁿ₂. We define f^+z:Fⁿ₂ →Rasf^+z(x) =f(x+z).

IFact 9. The Fourier coefficients off^+z arefd^+z(γ) = (−1)^γ·zfˆ(γ) and hence:

f^+z= X

S∈Fⁿ₂

fˆ(S)χS(z)χS.

IDefinition 10(Coset restriction). Forf :Fⁿ₂ →R, z∈Fⁿ₂ andH ≤Fⁿ₂we writef_H^+z:H →R for the restriction off toH+z.

IDefinition 11(Convolution).For two functionsf, g: Fⁿ₂ →Rtheir convolution (f∗g) :Fⁿ₂ → Ris defined as (f∗g)(x) =E_y∼U(Fⁿ

2)[f(y)g(x+y)].

For S ∈ Fⁿ₂ the corresponding Fourier coefficient of convolution is given as f[∗g(S) = fˆ(S)ˆg(S).

ITheorem 14. Let f:Fⁿ₂ → {+1,−1} be a Boolean function. Letξ∈[0,1] and γ < ¹⁻

√

ξ 2 . Let d= dim_ξ(f). Then,

1. D_(1−ξ)/2^→,U (f⁺)≤ D^lin,U_(1−ξ)/2(f)≤d, 2. D^lin,U_γ (f)≥d, 3. D^→,U_(1−ξ)/6≥ d 6. Proof.

Part 1. ¹² Sinced= dim_ξ(f), there exists a subspaceA≤Fⁿ₂ of dimension at mostdwhich satisfiesP

α∈Afˆ²(α)≥ξ. Letg:Fⁿ₂ →Rbe a function defined by its Fourier transform as follows:

ˆ g(α) =

(fˆ(α), ifα∈A 0, otherwise.

Consider drawing a random variableθ from the distribution with p.d.f 1− |θ|over [−1,1].

IProposition 15. For all t such that −1≤ t ≤1 and z ∈ {+1,−1} random variable θ satisfies:

Prθ[sgn(t−θ)6=z]≤ 1

2(z−t)².

Proof. W.l.o.g we can assumez= 1 as the casez=−1 is symmetric. Then we have:

θ[sgn(t−θ)6= 1] = Z 1

(1− |γ|)dγ≤ Z 1

(1−γ)dγ= 1

2(1−t)². J

Define a family of functionsgθ:Fⁿ₂ → {+1,−1} asgθ(x) =sgn(g(x)−θ). Then we have:

x∼Fⁿ₂[g_θ(x)6=f(x)]

= E

x∼Fⁿ₂

θ[g_θ(x)6=f(x)]

= E

x∼Fⁿ₂

θ[sgn(g(x)−θ)6=f(x)]

≤ E

x∼Fⁿ₂

2(f(x)−g(x))²

(by Proposition 15)

2kf−gk²₂. Using the definition ofgand Parseval we have:

2kf−gk²₂=1

2kf[−gk²₂= 1

2kfˆ−ˆgk²₂=1 2

α /∈A

fˆ²(α)≤ 1−ξ 2 .

Thus, there exists a choice ofθ such thatgθ achieves error at most ^1−ξ₂ . Clearlygθ can be computed based on thedparities forming a basis forAand henceD_(1−ξ)/2^lin,U (f)≤d.

12This argument is a refinement of the standard “sign trick” from learning theory which approximates a Boolean function by taking a sign of its real-valued approximation under`2.

Part 2. Fix any deterministic sketch that usesd−1 paritiesχα₁, . . . , χαd−1 and letS= (α1, . . . , α_d−1). For fixed values of these sketchesb = (b1, . . . , b_d−1) where bi =χα_i(x) we denote the resulting affine restriction off asf|(S,b). Using the standard expression for the Fourier coefficients of an affine restriction the constant Fourier coefficient of the restricted function is given as:

f\|_(S,b)(∅) = X

Z⊆[d−1]

(−1)P

i∈Zb_ifˆ X

i∈Z

αi

! .

Thus, we have:

f\|_(S,b)²(∅) = X

Z⊆[d−1]

fˆ²(X

i∈Z

αi) + X

Z16=Z2⊆[d−1]

(−1) P

i∈Z1 ∆Z2

bifˆ(X

i∈Z1

αi) ˆf(X

i∈Z2

αi).

Taking expectation over a uniformly randomb∼U(F^d₂) we have:

E_b∼U(Fd 2)

f\|_(S,b)²(∅)

=E_b∼U(Fd 2)



 X

Z⊆[d−1]

fˆ² X

i∈Z

αi

! +

Z₁6=Z2⊆[d−1]

(−1) P

i∈Z1 ∆Z2

b_ifˆ X

i∈Z1

αi

! fˆ X

i∈Z2

αi

!



= X

Z⊆[d−1]

fˆ² X

i∈Z

αi

! .

The latter sum is the sum of squared Fourier coefficients over a linear subspace of dimensiond−1<dimξ(f), and hence is strictly less than ξ. Using Jensen’s inequality:

E_b∼U(Fd 2)

h|f\|_(S,b)(∅)|i

≤ s

E_b∼U(Fd 2)

f\|_(S,b)²(∅)

<p ξ.

For a fixed restriction (S, b) if|fˆ|(S,b)(∅)|< αthen|Pr[f|(S,b)= 1]−Pr[f|(S,b)=−1]|< α and hence no algorithm can predict the value of the restricted function on this coset with probability at least ^1+α₂ . Thus no algorithm can predictf|_(α₁_,b₁_),...,(α_d−1_,b_d−1₎for a uniformly random choice of (b1, . . . , bd−1), and hence also on a uniformly at random chosen x, with probability at least ¹⁺

√

ξ 2 .

Part 3. We will need the following fact about entropy of a binary random variable. The proof is given in the appendix (Section A.1).

IFact 16. For any random variableX supported on{1,−1},H(X)≤1−¹₂(EX)². We will need the following proposition that states that random variables taking value in {1,−1}that are highly biased have low variance. The proof of Proposition 17 can be found in the appendix (Section E.1).

I Proposition 17. Let X be a random variable taking values in {1,−1}. Define p :=

min_b∈{1,−1}Pr[X =b]. ThenVar[X]∈[2p,4p].

In the next two lemmas, we look into the structure of a one-way communication protocol forf⁺, and analyze its performance when the inputs are uniformly distributed. We give a lower bound on the number of bits of information that any correct randomized one-way protocol reveals about Alice’s input, in terms of the linear sketching complexity off for uniform distribution¹³.

The next lemma bounds the probability of error of a one-way protocol from below in terms of the Fourier coefficients off, and the conditional distributions of different parities of Alice’s input conditioned on Alice’s random message.

ILemma 18. Let ∈[0,¹₂). Let Π be a deterministic one-way protocol for f⁺ such that Pr_x,y∼U(Fn

2)[Π(x, y)6=f⁺(x, y)]≤. Let M denote the distribution of the random message sent by Alice to Bob in Π. For any fixed message m sent by Alice, let Dm denote the distribution of Alice’s input xconditioned on the event thatM =m. Then,

4≥ X

α∈Fⁿ₂

fb²(α)· 1− E

m∼M

x∼DEm

[χ_α(x)]

²! .

Proof. For any fixed inputyof Bob, define^(y)m := Pr_x∼D_m[Π(x, y)6=f⁺(x, y)]. Thus, ≥ E

m∼M E

y∼U(Fⁿ₂)

[^(y)_m ]. (1)

Note that the output of the protocol is determined by Alice’s message andy. Hence for a fixed message and Bob’s input, if the restricted function is largely unbiased, then any protocol is forced to commit an error with high probability. Formally,

^(y)_m ≥ min

b∈{1,−1} Pr

x∼Dm

[f⁺(x, y) =b]≥Varx∼D_m[f⁺(x, y)]

4 . (2)

Sincef⁺(·,·) takes values in{+1,−1}, the second inequality follows from Proposition 17.

Now,

Var_x∼D_m[f⁺(x, y)] = 1−

x∼DE_m[f⁺(x, y)]

(sincef⁺(x, y)∈ {1,−1})

= 1−



 X

α∈Fⁿ₂

fb(α)χ_α(y) E

x∼D_m[χ_α(x)]





(by Fact 9 and linearity of expectation)

= 1−



 X

α∈Fⁿ₂

fb²(α)

x∼DEm

[χ_α(x)]

+ X

(α₁,α₂)∈Fⁿ₂×Fⁿ₂:α₁6=α₂

fb(α₁)f(αb ₂)χ_α

1+α2(y) E

x∼Dm

[χ_α

1(x)] E

x∼Dm

[χ_α

2(x)]



.

Taking expectation overy we have:

y∼U(Fⁿ₂)

Varx∼Dm[f⁺(x, y)]

= 1− X

α∈Fⁿ₂

fb²(α)

x∼DEm

[χ_α(x)]

. (3)

13We thus prove aninformation complexitylower bound. See, for example, [21] for an introduction to information complexity.

Taking expectation over messages it follows from (1), (2) and (3) that, 4≥1− X

α∈Fⁿ₂

fb²(α)· E

m∼M

x∼DEm

[χ_α(x)]

= X

α∈Fⁿ₂

fb²(α)· 1− E

m∼M

x∼DEm

[χ_α(x)]

. (4)

The second equality above follows from the Parseval’s identity (Fact 6). The lemma follows.

J Let:= ^1−ξ₆ . Let Π be a deterministic protocol such that Pr_x,y∼U(Fⁿ

2)[Π(x, y)6=f⁺(x, y)]≤, with optimal cost cΠ := D^→,U(f⁺) = D^→,U1−ξ

(f⁺). Let M denote the distribution of the random message sent by Alice to Bob in Π. For any fixed messagemsent by Alice, letDm

denote the distribution of Alice’s inputxconditioned on the event thatM =m. To prove Part 3 of Theorem 14 we use the protocol Π to come up with a subspace ofFⁿ₂. Next, in Lemma 19 (a) we prove, using Lemma 18, that f is ξ-concentrated on that subspace. In Lemma 19 (b) we upper bound the dimension of that subspace in terms ofcΠ.

ILemma 19. LetA:={α∈Fⁿ₂ :Em∼M(Ex∼D_mχ_α(x))²≥¹₃} ⊆Fⁿ₂. Let`= dim(span(A)).

Then, (a) `≥d.

(b) `≤6c_Π.

Proof. (a) We prove part (a) by showing thatf isξ-concentrated onspan(A). By Lemma 18 we have that

4≥ X

α∈span(A)

fb²(α)· 1− E

m∼M

x∼DEm

χ_α(x) 2!

α /∈span(A)

fb²(α)· 1− E

m∼M

x∼Dm

χ_α(x) ²!

> 2

3· X

α /∈span(A)

fb²(α).

Thus P

α /∈span(A)fb²(α) < 6. Hence, P

α∈span(A)fb²(α) ≥ 1−6 = ξ. Hence we have

`= dim(span(A))≥dimξ(f) =d.

(b) Notice that χ_α(x) is a unbiased random variable taking values in {1,−1}. For each α in the set Ain Proposition 19, the value ofEm∼M(Ex∼Dmχ_α(x))² is bounded away from 0. This suggests that for a typical message m drawn from M, the distribution of χ_α(x) conditioned on the event M =m is significantly biased. Fact 16 enables us to conclude that Alice’s message reveals Ω(1) bit of information aboutχ_α(x). However, since the total information content of Alice’s message is at mostc_Π, there can be at mostO(c_Π) independent vectors inA. Now we formalize this intuition.

LetT ={α1, . . . , α`}be a basis ofspan(A). Then,

cΠ≥H(M) (by the third inequality of Fact 45 (1))

≥I(M;χ_α

1(x), . . . , χα_`(x)) (by observation 47)

=H(χ_α

1(x), . . . , χα_`(x))−H(χ_α

1(x), . . . , χα_`(x)|M)

=`−H(χ_α

1(x), . . . , χα_`(x)|M) (by Fact 45 (3) asχ_α

i(x)’s are independent as random variables)

≥`−

i=1

H(χ_α

i(x)|M) (by Fact 45 (2))

≥`−`

1−1 2· 1

(by Fact 16)

= `

6. J

Recall thatcΠ=D^→,U1−ξ 6

(f⁺). Part 3 of Theorem 14 follows easily from Lemma 19:

D^→,U1−ξ 6

(f⁺) =c_Π

≥ `

6 (by Lemma 19 (b))

≥ d

6. (by Lemma 19 (a)) J

The proof of Theorem 4 now follows directly from Part 1 and Part 3 of Theorem 14 by settingξ= 1/3.

No documento LIPIcs – Leibniz International Proceedings in Informatics (páginas 178-183)