Leftover Hash Lemma

Recently, I’ve been investigating computational notions of entropy and had to use the Leftover Hash Lemma. (A variant of this lemma is stated below) I first encountered the Lemma several years ago but didn’t have to use it for anything… until now!

The lemma is attributed to Impagliazzo, Levin, and Luby [1]. A corollary of the lemma is that one can convert a source of (high-enough) Rényi entropy into a distribution that is uniform (or close to uniform). Before stating the Lemma, I’ll discuss a few different notions of entropy, including the classic Shannon Entropy, min-entropy, max-entropy, and so on. See [2] for many different applications for the entropy measures.

Entropy Measures

Consider the random variable X. We use x\xleftarrow{R} X to denote that the element x is randomly drawn from X. Denote its support by Supp(X). Define the sample entropy of x with respect to X as H_X(x) = \log\frac{1}{\mathbb{P}[X=x]}. The sample entropy measures how much randomness is present in the sample x when generated according to the law/density-function of X. Also, let H_X(x) = \infty when x\notinSupp(X). Then we can state the entropy measures in terms of the sample entropy:

  • Shannon Entropy: H(X) = \mathbb{E}_{x\xleftarrow{R} X}[H_X(x)]
  • Min-Entropy: H_\infty(X) = \min_{x\in\text{Supp}(X)}H_X(x)
  • Rényi Entropy: H_2(X) = -\log\sum_{x\in\text{Supp}(X)}\mathbb{P}_X(x)^2
  • Max-Entropy: H_0(X) = \log(|\text{Supp}(X)|)

How should we interpret these measures? The min-entropy can be seen as a worst-case measure of how “random” a random variable is. The Rényi entropy measure, intuitively, measures how “collision-resistant” a random variable is (i.e., think hash functions). In my opinion, max-entropy does not give much information, except for how large the support of a random variable is. These entropy measures are related by this inequality:

H_\infty(X)\leq H_2(X) \leq H(X) \leq H_0(X)

The inequality above is tight if and only if X is uniformly distributed on its support. The statement of the lemma below uses universal hash functions. Here is a definition:

A function family \mathcal{H} = \{h:\mathcal{D}\mapsto\mathcal{R}\} is two-universal if \forall x\neq x'\in\mathcal{D}, the following holds: \mathbb{P}_{h\xleftarrow{R}\mathcal{H}}[h(x) = h(x')]\leq 1/|\mathcal{R}|.

The Lemma

Statement: Let X be a random variable over \{0, 1\}^n with H_2(X)\geq k. Consider the two-universal function family \mathcal{H} = \{g:\{0, 1\}^n\mapsto\{0, 1\}^m\}. Then for any H\xleftarrow{R}\mathcal{H}, the statistical distance between (H, H(X)) and (H, \mathcal{U}_m) is at most \frac{1}{2}\cdot 2^{(m-k)/2}.

One can interpret the statement above as saying that you can convert a random variable with high-enough Rényi entropy into a random variable that is very close to uniform.

References

[1]  Impagliazzo, Russell; Levin, Leonid A.; Luby, Michael (1989). Pseudo-random Generation from one-way functions.

[2] Iftach Haitner and Salil Vadhan. The Many Entropies in One-Way Functions, pages 159–217. Springer International Publishing, 2017

Leave a comment