Fall 2025: Topics in Information-Theoretic Cryptography

Since the beginning of this year, I have been developing a course on “Topics in Information-Theoretic Cryptography”. Recently, the course was approved for Fall 2025. I’m very excited to share some research with undergraduate/graduate students! Below, I list some relevant information for the proposed course.

Course Number and Title

ECE598DA: Topics in Information-Theoretic Cryptography

Description

In this course, we will study foundational and recent work on the use of information theory to design and analyze cryptographic protocols. We will begin by studying privacy attacks which motivate strong privacy and security definitions. Then, we will explore the basics of differential privacy and study some core works on zero-knowledge proofs. Finally, we will explore various applications, including watermarking of generative models.

Syllabus

Week 1: Introduction: motivations, one-time pad review, review of probability theory

Week 2: Attacks and Composition Theorems for Differential Privacy

Week 3: Standard Mechanisms for Differential Privacy

Week 4: Information-Theoretic Lower Bounds for Differential Privacy

Week 5: Differentially Private Statistical Estimation and Testing

Week 6: Zero-Knowledge Proofs

Week 7: Statistical Zero-Knowledge Proofs: Part I

Week 8: Statistical Zero-Knowledge Proofs: Part II

Week 9: Multi-Party Computation

Week 10: Multi-Party and Computational Differential Privacy

Week 11: Code-Based Cryptography: Part I

Week 12: Code-Based Cryptography: Part II

Week 13: More Applications

Watermarking of Generative Models
Proof Systems for Machine Learning
Bounded-Storage Cryptography
Quantum Cryptography

Week 14: Project Presentations

Eigenvalues and Laplacian of a Graph: A Useful Lemma

Recently, for one of my projects I have been using the Laplacian matrix to study the spectra of graphs. For this post, I’ll review a tiny bit of background material on matrix theory for spectral graph theory. The majority of the material in this post can be found in the survey by Chung (1997).

What is the significance of eigenvalues and the Laplacian in understanding the properties and structure of graphs? Below, we highlight examples that show a connection between graph eigenvalues and the geometric aspects of graphs.

The Laplacian

Let $G$ be a graph with vertex degree denoted by $d_v$ for vertex $v$ . Consider the following matrix $L$ :

The Laplacian matrix $\mathcal{L}$ is then defined by $\mathcal{L} = T^{-1/2}LT^{-1/2}$ , where $T$ is the diagonal matrix with $(v, v)$ -th entry having value $d_v$ . (This definition of the Laplacian is for graphs without loops and multiple edges.)

Examples

Examples of special graphs and the eigenvalues of their Laplacians:

The star graph $S_n$ on $n$ vertices has eigenvalues 0, 1 (with multiplicity $n-2$ ), and 2.
For the complete graph $K_n$ on $n$ vertices, the eigenvalues are $0$ and $\frac{n}{n - 1}$ with multiplicity $n - 1$ .
For the path $P_n$ on $n$ vertices, the eigenvalues are $1 - \cos\left(\frac{\pi k}{n-1}\right)$ for $k = 0, 1, \ldots, n - 1$ .

Basic fact about the spectrum of a graph

Lemma (e.g., see Lemma 1.7 in [1]):
For a graph $G$ on $n$ vertices, we have:

$\sum_i \lambda_i \leq n$ with equality holding if and only if $G$ has no isolated vertices.
If $G$ is connected, then $\lambda_1 > 0$ . If $\lambda_i = 0$ and $\lambda_{i+1} \neq 0$ , then $G$ has exactly $i + 1$ connected components.
The spectrum of a graph is the union of the spectra of its connected components.

Exercise for the reader: Can you rewrite the lemma in terms of the Laplacian?

[1] F.R.K. Chung. Spectral Graph Theory. Conference Board of Mathematical Sciences. American Mathematical Society, 1997.

Cardinality Estimation in Linear Time using Sub-Linear Space

The cardinality of a collection A (which might be an ordered or unordered list, a set, or what not) is basically the number of unique values in A. For example, the collections [1,2,3,4] and [1,2,1,3,1,4,3] have the same cardinality of 4 (and also correspond to the same set).

Determing the Cardinality of a Collection: The Naive Approach

Consider a collection A=[1,2,1,3,1,4,3]. How can we systematically determine the cardinality of A? Well, here are two of many ways to do this:

First sort A in ascending order. Then, we can perform a linear scan on A to remove duplicates. It’s pretty easy to see how this can be done. Finally, return the size of the possibly trickled-down collection (now a set) obtained. If the initial size of A is $n$ . Then, the cardinality of A, using this method, can be determined in $O(n\, log\, n)$ (if we use merge-sort) time and $O(1)$ extra space.
Use a hash table: Perform a linear scan of A, hashing the values of A. It’s easy to see that cardinality of A is the number of keys in the hash table obtained. This uses $O(n)$ time but $O(n)$ extra space also.

Notice that we can’t do any better (lower upper-bound) than $O(n)$ because we have to look at the entire input (which is of size $n$ ). But can we determine the cardinality of A in $O(n)$ time using sub-linear space (using strictly smaller space than $n$ )?

That’s where probability comes in.

Linear Probabilistic Counting

This is a probabilistic algorithm for counting the number of unique values in a collection. It produces an estimation with an arbitrary accuracy that can be pre-specified by the user using only a small amount of space that can also be pre-specified. The accuracy of linear counting depends on the load factor (think hash tables) which is the number of unique values in the collection divided by the size of the collection. The larger the load factor, the less accurate the result of the linear probabilistic counter. Correspondingly, the smaller the load factor, the more accurate the result. Nevertheless, load factors much higher than 1 (e.g. 16) can be used while achieving high accuracy in cardinality estimation (e.g. <1% error).

Note: in simple hashing, the load factor cannot exceed 1 but in linear counting, the load factor can exceed 1.

Linear counting is a two-step process. In step 1, the algorithm allocates a bit map of a specific size in main memory. Let this size be some integer $m$ . All entries in the bit map are initialized to “0”‘s. The algorithm then scans the collection and applies a hash function to each data value in the collection. The hash function generates a bit map address and the algorithm sets this addressed bit to “1”. In step 2, the algorithm first counts the number of empty bit map entries (equivalently, the number of entries set to “0”). It then estimates the cardinality of the collection by dividing this count by the bit map size $m$ (thus obtaining the fraction of empty bit map entries. Call this $V_n$ ).

Plug in $V_n$ and $m$ into the equation $r = m\, log_e\, V_n$ to obtain $r$ which is the estimated cardinality of the collection. The derivation of this equation is detailed in this paper[1].

Here’s my simple implementation of the Linear Probabilistic Counting Algorithm.

Errors in Counting

Although the linear probabilistic counting method is faster than deterministic approaches, it might sometimes fail to be accurate as explained above. So this method should be used only when 100% accuracy is not needed. For example, in determining the number of unique visitors to a website.

Probabilistic Alternatives to Linear Probabilistic Counting

The HyperLogLog algorithm is such an alternative. It also runs in linear time (linear in the size of the initial collection). But the HyperLogLog counter usually gives a more accurate estimation of the cardinality count and also uses less space. See this paper for more details on the HyperLogLog counter.

References

[1] A Linear-time Probablistic Counting Algorithm for Database Applications

Why (and How) Things Work

In Honor of David Blackwell

Category Archives for Mathematics