Leftover Hash Lemma

Recently, I’ve been investigating computational notions of entropy and had to use the Leftover Hash Lemma. (A variant of this lemma is stated below) I first encountered the Lemma several years ago but didn’t have to use it for anything… until now!

The lemma is attributed to Impagliazzo, Levin, and Luby [1]. A corollary of the lemma is that one can convert a source of (high-enough) Rényi entropy into a distribution that is uniform (or close to uniform). Before stating the Lemma, I’ll discuss a few different notions of entropy, including the classic Shannon Entropy, min-entropy, max-entropy, and so on. See [2] for many different applications for the entropy measures.

Entropy Measures

Consider the random variable X. We use x\xleftarrow{R} X to denote that the element x is randomly drawn from X. Denote its support by Supp(X). Define the sample entropy of x with respect to X as H_X(x) = \log\frac{1}{\mathbb{P}[X=x]}. The sample entropy measures how much randomness is present in the sample x when generated according to the law/density-function of X. Also, let H_X(x) = \infty when x\notinSupp(X). Then we can state the entropy measures in terms of the sample entropy:

  • Shannon Entropy: H(X) = \mathbb{E}_{x\xleftarrow{R} X}[H_X(x)]
  • Min-Entropy: H_\infty(X) = \min_{x\in\text{Supp}(X)}H_X(x)
  • Rényi Entropy: H_2(X) = -\log\sum_{x\in\text{Supp}(X)}\mathbb{P}_X(x)^2
  • Max-Entropy: H_0(X) = \log(|\text{Supp}(X)|)

How should we interpret these measures? The min-entropy can be seen as a worst-case measure of how “random” a random variable is. The Rényi entropy measure, intuitively, measures how “collision-resistant” a random variable is (i.e., think hash functions). In my opinion, max-entropy does not give much information, except for how large the support of a random variable is. These entropy measures are related by this inequality:

H_\infty(X)\leq H_2(X) \leq H(X) \leq H_0(X)

The inequality above is tight if and only if X is uniformly distributed on its support. The statement of the lemma below uses universal hash functions. Here is a definition:

A function family \mathcal{H} = \{h:\mathcal{D}\mapsto\mathcal{R}\} is two-universal if \forall x\neq x'\in\mathcal{D}, the following holds: \mathbb{P}_{h\xleftarrow{R}\mathcal{H}}[h(x) = h(x')]\leq 1/|\mathcal{R}|.

The Lemma

Statement: Let X be a random variable over \{0, 1\}^n with H_2(X)\geq k. Consider the two-universal function family \mathcal{H} = \{g:\{0, 1\}^n\mapsto\{0, 1\}^m\}. Then for any H\xleftarrow{R}\mathcal{H}, the statistical distance between (H, H(X)) and (H, \mathcal{U}_m) is at most \frac{1}{2}\cdot 2^{(m-k)/2}.

One can interpret the statement above as saying that you can convert a random variable with high-enough Rényi entropy into a random variable that is very close to uniform.

References

[1]  Impagliazzo, Russell; Levin, Leonid A.; Luby, Michael (1989). Pseudo-random Generation from one-way functions.

[2] Iftach Haitner and Salil Vadhan. The Many Entropies in One-Way Functions, pages 159–217. Springer International Publishing, 2017

Eigenvalues and Laplacian of a Graph: A Useful Lemma

Recently, for one of my projects I have been using the Laplacian matrix to study the spectra of graphs. For this post, I’ll review a tiny bit of background material on matrix theory for spectral graph theory. The majority of the material in this post can be found in the survey by Chung (1997).

What is the significance of eigenvalues and the Laplacian in understanding the properties and structure of graphs? Below, we highlight examples that show a connection between graph eigenvalues and the geometric aspects of graphs.

The Laplacian

Let G be a graph with vertex degree denoted by d_v for vertex v. Consider the following matrix L:

The Laplacian matrix \mathcal{L} is then defined by \mathcal{L} = T^{-1/2}LT^{-1/2}, where T is the diagonal matrix with (v, v)-th entry having value d_v. (This definition of the Laplacian is for graphs without loops and multiple edges.)

Examples

Examples of special graphs and the eigenvalues of their Laplacians:

  1. The star graph S_n on n vertices has eigenvalues 0, 1 (with multiplicity n-2), and 2.
  2. For the complete graph K_n on n vertices, the eigenvalues are 0 and \frac{n}{n - 1} with multiplicity n - 1.
  3. For the path P_n on n vertices, the eigenvalues are 1 - \cos\left(\frac{\pi k}{n-1}\right) for k = 0, 1, \ldots, n - 1.

Basic fact about the spectrum of a graph

Lemma (e.g., see Lemma 1.7 in [1]):
For a graph G on n vertices, we have:

  1. \sum_i \lambda_i \leq n with equality holding if and only if G has no isolated vertices.
  2. If G is connected, then \lambda_1 > 0. If \lambda_i = 0 and \lambda_{i+1} \neq 0, then G has exactly i + 1 connected components.
  3. The spectrum of a graph is the union of the spectra of its connected components.

Exercise for the reader: Can you rewrite the lemma in terms of the Laplacian?

[1] F.R.K. Chung. Spectral Graph Theory. Conference Board of Mathematical Sciences. American Mathematical Society, 1997.

On Semi-Supervised Node Labeling

Today is leap year day. I tried to weave the occasion into this short blog post but failed. Happy leap year day! I guess I have to wait until 2028 to write a blog post that has something to do with the day. The rest of the post has nothing to do with leap years.

Recently, I have been investigating the theory of the “node labeling” problem. The goal is to fully label the graph: use labels of some nodes in the graph to propagate labels to all the other nodes in the graph. Large network graphs display statistical characteristics that can be analyzed to gain deeper insights into complex systems. A key challenge within semi-supervised node classification involves allocating labels to all unlabeled nodes within a network graph.

Node Labeling Problem

The problem concerns classifying unlabeled nodes in a graph using information about the already-labeled nodes. Here are some applications:

  1. For recommendation systems based on interests of other individuals in the network
  2. Advertising systems
  3. Sociological studies of communities (e.g., inferring community affinity) and studying how ideas/memes spread

One could ask: Why not ask users to self-label? There are several issues with requiring self-labels: users may fail to update their labels; out of concerns for privacy, users may distort their information; users may put misleading information. As a side note, sociologists have historically used financial incentives to motivate individuals to provide accurate labels [1].

Now to actually solve the problem, there seems to be two broad types techniques used:

  1. Classification: Use graph information as features into a traditional classification system.
  2. Random Walks: Propagate existing labels via random walks.

Classification Methods

Essentially, these iterative methods use local neighborhood information to generate features that can be used to learn local classifiers. A problem with this approach, however, is that basing the labels
solely on labels of other nodes can lead to isolated components being unlabeled.

Random Walks

The method starts from a set of labeled nodes and performs a random walk on the graph. For this method to provide a complete labeling, every unlabeled node in the graph is assumed to be reachable (in finite steps) from some labeled node. This is called label connectivity.

[1] Wayne W. Zachary. An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4):452–473, 1977