Fact 2.6
Conditioning decreases entropy.
Another measure we will use (briefly) in our proof is that of mutual information, which informally captures the correlation between two random variables.
Definition 2.14
Conditional mutual information. The mutual information between two random variables A and Β, denoted by I(A; B), is defined as
The conditional mutual information between A and Β given C, denoted by I(A; B|C), is defined as
The following is a well-known fact on mutual information.
Fact 2.7
Data processing inequality. Suppose we have the following Markov Chain:
where X⊥Z|Y. Then I(X; Z) ≥ I(X; Z) or, equivalently, H(X|Y) ≤ H(X|Z).
Mutual information is related to the Kullback Leiber Divergence similarity measure (also known as relative entropy).
Definition 2.15
Kullback-Leiber Divergence. Given two probability distributions μ1 and μ2 on the same sample space Ω such that (∀ω ∈ Ω)(μ2(ω) = 0 ⇒ μ1(ω) = 0), the Kullback-Leibler Divergence between them is defined as
The connection between the mutual information and the Kullback-Leibler divergence is provided by the following fact.
Fact 2.8
For random variables A, B, and C we have
2.7 Useful Lemmata
2.7.1 Concentration
Lemma 2.2
Chernoff bound. Let X1,…, Xn be independent and identically distributed (i.i.d.) random variables taking value from {0, 1} and let p be the probability that Xi = 1; then, for any δ > 0, we have
2.7.2 Pseudorandomness
Theorem 2.4
k-wise independence Chernoff bound [Schmidt et al. 1995 (Theorem 5.I)]. Let x1 … xn ∈ [0, 1] be k-wise independent random variables, and let
2.7.3 λ-Biased Sets
Definition 2.16
λ-biased sets. Let G be a finite field, and t > 0 an integer. A multiset S ⊆ Gt is λ-biased if for every non-trivial character χ of Gt,
Lemma 2.3
[Azar et al. 1998 (Theorem 3.2)]. A randomly chosen multi-set S ⊆ Gt of cardinality Θ(t log |G|/λ2) is λ-biased with high probability.
For many applications, an explicit construction is necessary. In our case, however, we can enumerate over all sets S of sufficient cardinality in quasi-polynomial time.2 The following Sampling Lemma due to Ben-Sasson et al. [2003] allows us to estimate the average of any function over Gt using only one line and (1 + o(1)) log2 |Gt| randomness:
Lemma 2.4
Sampling Lemma [Ben-Sasson et al. 2003 (Lemma 4.3)]. Let B : Gt → [0, 1]. Then, for any ϵ > 0,
2.7.4 Partitions
Given a 2CSP formula, we provide a few techniques to deterministically partition n variables to approximately
Greedy Partition
Lemma 2.5
Let G = (V, E) bea d-regular graph and n ≜ |V|. We can partition V into n/k disjoint subsets {S1,…, Sn/k} of size at most 2k such that:
Proof
We assign vertices to subsets iteratively, and show by induction that we can always maintain (2.1) and the bound on the subset size. Since the average set size is less than k, we have by Markov’s inequality that at each step less than half of the subsets are full. The next vertex we want to assign, v, has neighbors in at most d subsets. By our induction hypothesis, each Si is of size at most 2k, so on average over j ∈ [n/k], it has less than 4dk2/n neighbors in each Sj. Applying Markov’s inequality again, Si has at least 8d2k2/n neighbors in less than a (1/2d)-fraction of subsets Sj. In total, we ruled out less than half of the subsets for being full, and less than half of the subsets for having too many neighbors with subsets that contain neighbors of v. Therefore there always exists some subset Si to which we can add v while maintaining the induction hypothesis.
■
Derandomized Partition
We use Chernoff bound with Θ(log n)-wise independent variables to deterministically partition variables into subsets of cardinality ≈