From Postdoc Notes to a Full Textbook

During the 2024–2025 academic year, I decided to start writing detailed lecture notes on Topics in Information-Theoretic Cryptography (https://dacesresearch.org/infocrypto/). At the time, I was still thinking about research (e.g., in differential privacy, zero-knowledge, and information-theoretic security more broadly) while also preparing to transition into my faculty role at UIUC.

During that period, I started drafting early versions of the lecture notes that would eventually form the backbone of my Fall 2025 graduate course at UIUC. These weren’t intended to be a book (at least, not at first). They were simply my attempt to consolidate ideas I was using in my research (from fingerprinting lower bounds to statistical zero-knowledge to watermarking generative models) into a cohesive pedagogical narrative.

I experimented heavily with new ways to explain familiar concepts. I rewrote some proofs repeatedly. I paired classical topics (e.g., the One-Time Pad and Shannon entropy) with modern concerns such as data-market privacy risks, statistical attacks on machine learning models, and quantum-era cryptographic threats.

By the time I arrived at UIUC in Fall 2025, the notes had already grown into something far larger than a lecture packet. Teaching the course from these notes, and expanding them week after week, revealed that they could become more than supplementary material. Maybe these notes could become a full textbook?

This blog post is a reflection on that journey: how the material grew, what the book covers, and the many people and institutions who made it possible.

Reflections on the Process

1. Writing revealed connections I hadn’t noticed before.

Integrating ZK, DP, MPC, and quantum topics forced me to articulate the conceptual threads uniting them.

2. Student questions shaped the clarity of the exposition.

When multiple students struggled with the same definition, I rewrote it. Many of those improved explanations are now part of the book.

3. Compiling a textbook is a creative research act.

Several new lemmas, interpretations, and frameworks arose during the writing (simply from trying to explain concepts more cleanly).

Book Chapters

Compiling the textbook required reorganizing an entire semester’s worth of evolving lecture notes into a coherent structure that, I hope, could guide a reader from basics of probability to the frontiers of modern security. Below is a thematic overview of how the chapters came together.

1. Foundations

The book opens with a modern introduction to cryptography, revisiting the motivations, core goals, and roles of secrecy, randomness, and adversaries. It then transitions through a detailed review of probability (i.e., expectation, independence, conditional distributions) and into essential tools from information theory.

I believe this foundation anchors the rest of the text and supports the many advanced topics that follow.

2. Attacks That Motivate the Theory

A distinctive early feature of the book is its chapter on attacks, including:

  • reconstruction attacks
  • chosen-plaintext and side-channel attacks
  • valuation attacks in data markets

These examples provide students with an intuitive understanding of what must be defended and why theory matters.

3. Differential Privacy: From Basics to RDP and Hypothesis Testing

DP occupies several chapters, covering:

  • Laplace and Gaussian mechanisms
  • composition theorems
  • Rényi DP
  • DP-SGD
  • framing DP through the lens of hypothesis testing

This was one of the most extensive parts of the rewriting process, as I attempted to unify multiple strands of the privacy literature into one narrative.

4. Lower Bounds in Differential Privacy

Another major contribution of the book is its treatment of lower bounds:

  • packing arguments
  • fingerprinting codes
  • mutual-information-based bounds
  • connections to group privacy

These tools help readers understand the inherent limitations of privacy guarantees.

5. Statistical Estimation, Testing, and Machine Learning Under DP

Later chapters connect DP mechanisms to classical statistical tasks:

  • mean/variance estimation
  • linear regression
  • hypothesis testing
  • utility tradeoffs

Each topic demonstrates how information-theoretic reasoning guides algorithm design.

6. Privacy in Distributed Systems: LDP, Shuffling, MPC, FL

This chapter weaves together local differential privacy and secure multiparty computation—two topics rarely unified in a single textbook:

  • randomized response and k-ary LDP
  • shuffle model and ESA
  • MPC definitions and protocols
  • secure summation
  • federated learning with DP

7–10. Zero-Knowledge Proofs and Information-Theoretic Proof Systems

These chapters form a complete narrative arc:

  • classical ZK protocols (3-coloring, GI)
  • statistical zero-knowledge and SZK-complete problems
  • multi-verifier SZK
  • ZK over secret-shared data
  • linear PCPs and IOPs
  • polynomial commitments and inner-product arguments

11. Multi-Party Differential Privacy

A modern and emerging topic, combining cryptographic and information-theoretic privacy:

  • adversary models
  • distributed noise-addition protocols
  • MPC-based DP
  • simulation and composition theorems

This chapter, in my opinion, is one of the most forward-looking in the book. (I have some active research projects in this space.)

12. Quantum Cryptography

A full chapter on quantum mechanics and its cryptographic implications, featuring:

  • the photon-polaroid experiment
  • superposition, entanglement, and measurement
  • Shor’s algorithm
  • QKD (BB84)
  • pure vs. mixed states

This chapter offers both intuitive and formal perspectives.

13. Watermarking, Steganography, and AI Content

The final chapter bridges classical information hiding with generative AI:

  • perceptual models and robustness
  • spread-spectrum and QIM watermarking
  • deep-learning-based steganography
  • watermarking of large generative models
  • powered randomness used for sampling

This connects the field’s classical roots to current and future security challenges.

Acknowledgements

I developed the bulk of the course materials for the accompanying course during my postdoc, while supported by a Simons Junior Fellowship from the Simons Foundation (965342, D.A.). I am deeply grateful for this support; it gave me the intellectual space to design the course, think deeply about its structure, and begin drafting what would become this book.

This book would not have been possible without the support of my colleagues at UIUC, especially in the Department of Electrical and Computer Engineering. Many colleagues provided helpful feedback while I was developing the materials, attended some class sessions where I tested parts of the exposition, or offered valuable insights on how to structure complex topics such as zero-knowledge proofs, differential privacy, and information-theoretic analyses. Their encouragement and technical discussions greatly shaped the final form of the text.

I will, most likely, upgrade the textbook everytime I teach a subset of the topics covered!

Tradeoffs Matter: On Developing Lower Bounds

As I write this blog post, I just received news that the U.S. is designating Nigeria as a ‘country of particular concern’ over Christian persecutions. I cannot think of any other country with such a large population but (almost) equal representation of Christians and Muslims. Since before I was born, this has caused a lot of friction but somehow the country has survived all that friction (thus far!). But the friction stems from tradeoffs of having such religious heterogeneity, a topic of its own discussion. But in this post, I’ll focus on the high levels of mathematical tradeffs.

There’s something deeply human about lower bounds. They’re not just mathematical artifacts; they’re reflections of life itself. To me, a lower bound represents the minimum cost of achieving something meaningful. And in both life and research, there’s no escaping those costs.


The Philosophy: Tradeoffs Everywhere

Growing up, I was lucky enough to have certain people in my family spend endless hours guiding me: helping with schoolwork, teaching patience, pushing me toward growth. Looking back, I realize these people could have done a hundred other things with that time. But they (especially my mum) chose to invest it in me. That investment wasn’t free. It came with tradeoffs, the time she could never get back. But without that investment, I wouldn’t be who I am today.

That’s the thing about life: everything has a cost. In 2022/2023, I could have focused entirely on my research. But instead, I poured my energy into founding NaijaCoder, a technical education nonprofit for Nigerian students. It was rewarding, but also consuming. I missed out on months of uninterrupted research momentum. And yet, I have no regrets! Because that, too, was a lower bound. The minimum “cost” of building something (hopefully) lasting and impactful.

Every meaningful pursuit (i.e., love, growth, service, research) demands something in return. There are always tradeoffs. Anyone who claims otherwise is ignoring the basic laws of nature. You can’t have everything, and that’s okay. The beauty lies in understanding what must be given up for what truly matters.


Lower Bounds in Technical Domains

Mathematicians talk about lower bounds as the limits of efficiency. When we prove a lower bound, we’re saying: “No matter how clever you are, you can’t go below this.” It’s not a statement of despair: it’s a statement of truth.

Lower bounds define the terrain of possibility. They tell us what’s fundamentally required to solve a problem, whether it’s time, space, or communication. In a strange way, they remind me of the constraints in life. You can’t do everything at once. There’s a cost to progress. To prove a good lower bound is to acknowledge that reality has structure. There’s an underlying balance between effort and outcome. It’s an act of intellectual honesty: a refusal to pretend that perfection is free.

Nowhere do these tradeoffs feel more personal than in privacy. In the quest to protect individual data, we face the same universal truth: privacy has a price. If we want stronger privacy guarantees, we must give up some accuracy, some utility, some convenience. In differential privacy, lower bounds quantify that tension. They tell us that no matter how sophisticated our algorithms are, we can’t perfectly protect everyone’s data and keep every detail of the dataset intact. We must choose what to value more — precision of statistical estimates or protection.

These aren’t technical inconveniences; they’re moral lessons. They remind us that every act of preservation requires some loss. Just as my mother’s care required time, or NaijaCoder required research sacrifices, protecting privacy requires accepting imperfection.

Acceptance

The pursuit of lower bounds (in research or in life) is about humility. It’s about recognizing that limits aren’t barriers to doing good work; they’re the context in which doing good becomes possible.

Understanding lower bounds helps us stop chasing the illusion of “free perfection.” It helps us embrace the world as it is: a world where tradeoffs are natural, where effort matters, and where meaning is found not in escaping limits but in working within them gracefully.

So, whether in mathematics, privacy, or life, the lesson is the same: there are always tradeoffs. And that’s not a tragedy; it’s the very structure that gives our choices value.

I hope these ideas shape how I live, teach, and do research going forward. In my work on privacy, I’m constantly reminded that (almost) every theorem comes with a cost and that understanding those costs makes systems more honest and human. In education, through NaijaCoder, I see the same principle: every bit of growth for a student comes from someone’s investment of time and care.

Developing lower bounds isn’t just a mathematical pursuit. It’s a philosophy of life, one that teaches patience, realism, and gratitude. The world is full of limits, but within those limits, we can still create beauty, meaning, and progress: one bounded step at a time.

When Influence Scores Betray Us: Efficiently Attacking Memorization Scores

tl;dr 👉 We just put out work on attacking influence-based estimators in data markets. The student lead (who did most of the work) is Tue Do! Check it out. Accurate models are not enough. If the auditing tools we rely on can be fooled, then the trustworthiness of machine learning is on shaky ground.

Modern machine learning models are no longer evaluated solely by their training or test accuracy. Increasingly, we ask:

  • Which training examples influenced a particular prediction?
  • How much does the model rely on each data point?
  • Which data are most valuable, or most dangerous, to keep?

Answering these questions requires influence measures, which are mathematical tools that assign each training example a score reflecting its importance or memorization within the model. These scores are already woven into practice: they guide data valuation (identifying key examples), dataset curation (removing mislabeled or harmful points), privacy auditing (tracking sensitive examples), and even data markets (pricing user contributions).

But here lies the problem: what if these influence measures themselves can be attacked? In our new paper, Efficiently Attacking Memorization Scores, we show that they can. Worse, the attacks are not only possible but efficient, targeted, and subtle.


Memorization Scores: A Primer

A memorization score quantifies the extent to which a training example is “remembered” by a model. Intuitively:

  • A point has a high memorization score if the model depends heavily on it (e.g., removing it would harm performance on similar examples).
  • A low score indicates the model has little reliance on the point.

Formally, scores are often estimated through:

  • Leave-one-out retraining (how accuracy changes when a point is removed).
  • Influence functions (approximating parameter sensitivity).
  • Gradient similarity measures (alignment between gradients of a point and test loss).

Because they are computationally heavy, practical implementations rely on approximations, which (one could argue) introduces new fragilities.


The Adversarial Setting

We consider an adversary whose goal is to perturb training data so as to shift memorization scores in their favor. Examples include:

  • Data market gaming (prime motivation): A seller inflates the memorization score of their data to earn higher compensation.
  • Audit evasion: A harmful or mislabeled point is disguised by lowering its score.
  • Curation disruption: An attacker perturbs examples so that automated cleaning pipelines misidentify them as low-influence.

Constraints:

The attack could satisfy a few key conditions but we focus on

  1. Efficiency: The method must scale to modern, large-scale datasets.
  2. Plausibility: Model accuracy should remain intact, so the manipulation is not caught by standard validation checks.

The Pseudoinverse Attack

Our core contribution is a general, efficient method called the Pseudoinverse Attack: (1) Memorization scores, though nonlinear in general, can be locally approximated as a linear function of input perturbations. This mirrors how influence functions linearize parameter changes. (2) We solve an inverse problem (specified in paper), compute approximate gradients that link input perturbations to score changes, use the pseudo-inverse to find efficient perturbations and apply them selectively to target points. This avoids full retraining for each perturbation and yields perturbations that are both targeted and efficient.


Validation

We validate across image classification tasks (e.g., CIFAR benchmarks) with standard architectures (CNNs, ResNets).

Key Findings

  1. High success rate: Target scores can be reliably increased or decreased.
  2. Stable accuracy: Overall classification performance remains essentially unchanged.
  3. Scalability: The attack works even when applied to multiple examples at once.

Example (Score Inflation): A low-memorization image (e.g., a benign CIFAR airplane) is perturbed. After retraining, its memorization score jumps into the top decile, without degrading accuracy on other examples. This demonstrates a direct subversion of data valuation pipelines.


Why This Is Dangerous

The consequences ripple outward:

  • Data markets: Compensation schemes based on memorization become easily exploitable.
  • Dataset curation: Automated cleaning fails if adversaries suppress scores of mislabeled or harmful points.
  • Auditing & responsibility: Legal or ethical frameworks built on data attribution collapse under adversarial pressure.
  • Fairness & privacy: Influence-based fairness assessments are no longer trustworthy.

If influence estimators can be manipulated, the entire valuation-based ecosystem is at risk.

Conclusion

This work sits at the intersection of adversarial ML and interpretability:

  • First wave: Adversarial examples. i.e., perturb inputs to fool predictions.
  • Second wave: Data poisoning and backdoor attacks. i.e., perturb training sets to corrupt models.
  • Third wave (our focus): Attacks on the auditing layer: perturb training sets to corrupt pricing/interpretability signals without harming predictions/accuracy.

This third wave is subtle but potentially more damaging: if we cannot trust influence measures, then even “good” models become opaque and unaccountable. As machine learning moves toward explainability and responsible deployment, securing the interpretability layer is just as critical as securing models themselves.

Our paper reveals a new adversarial frontier: efficiently manipulating memorization scores.

  • We introduce the Pseudoinverse Attack, an efficient, targeted method for perturbing training points to distort influence measures.
  • We show, supported by theory and experiments, that memorization scores are highly vulnerable, even under small, imperceptible perturbations.
  • We argue that this undermines trust in data valuation, fairness, auditing, and accountability pipelines.