Multi-Bit Watermarking

tl;dr Recently, my research group and I have been collaborating with scientists at Amazon to design, implement, and test multi-bit watermarking schemes. We will likely release our pre-print soon. But until then, in this blog post, I present some thoughts on multi-bit text watermarking.

Text watermarking can be introduced as a digital provenance tool: a model provider slightly changes generation so that later, with the right detector, the model provider can tell (preferably with overwhelmingly high confidence) whether some generated text came from its model. That is the zero-bit version: the hidden payload or message is just “watermark present.” The more interesting version is multi-bit watermarking, where the generated text carries a message: a user ID, model ID, timestamp, content-policy tag, licensing tag, or audit trail.

That extra payload is useful, but it creates a fundamental tension. The process of embedding more bits requires the output distribution to depend more strongly on the hidden message; stronger dependence makes decoding easier, but also makes the text easier to distinguish from ordinary model output. Our latest work (to be released soon!) states this tradeoff directly: higher payloads tend to increase detectability, while stronger distortion-free requirements reduce achievable rates.

This is not a brand new conceptual problem. It is the LLM version of a much older question in information hiding, data hiding, steganography, and digital watermarking: how many bits can be hidden in a host object while preserving some notion of distributional fidelity, stealth, or robustness? Moulin and O’Sullivan’s information-theoretic analysis describes information hiding as hiding information in a host data set so that it can be reliably communicated to a receiver, covering watermarking, fingerprinting, steganography, and data embedding [1]. Cachin’s information-theoretic steganography model casts the adversary’s task as a hypothesis test between innocent cover messages and stego messages, with security quantified through distributional divergence [2]. Chen and Wornell’s quantization-index modulation work studies embedding a signal, such as a digital watermark, inside a host signal to form a composite signal, with provable rate-distortion-robustness behavior [3].

The LLM twist is that the “host” is not a fixed image, audio clip, or document. It is a next-token distribution. At each generation step, the base model gives a distribution ( $Q_t$ ) over the vocabulary, and the watermarker chooses a nearby distribution ( $Q_t^\star$ ) from which it samples. This turns multi-bit text watermarking into a channel coding problem with a distributional constraint.

From zero-bit detection to multi-bit communication

A zero-bit watermark asks:

$H_0: X_{1:n} \sim$ ordinary model versus $H_1: X_{1:n} \sim$ watermarked model.

A multi-bit watermark asks a stronger question about the following relation:

$M \in \{0,1\}^k \quad \longrightarrow \quad X_{1:n} \quad \longrightarrow \quad \widehat M,$

where $M$ is the hidden message, $X_{1:n}$ is the generated text, and $\widehat M$ is the decoded message. The receiver may know a secret key, a public codebook, and perhaps the base model. The adversary may try to detect, remove, forge, or corrupt the watermark.

Early LLM watermarking work focused largely on zero-bit detection. Kirchenbauer et al.’s watermark, for example, randomly selects “green” tokens before generation and softly promotes them during sampling, giving an efficient statistical detector [4]. SynthID-Text is another prominent watermarking system for LLM outputs, described as preserving text quality while enabling efficient detection with minimal latency overhead. [5] Multi-bit schemes move beyond detection into payload extraction; for example, multi-bit text watermarking methods have been proposed for traceability, robust extraction, and paraphrase-resilient embedding.

Our work frames this shift using an information-theoretic channel view: reliable message recovery is governed by the conditional mutual information of the induced watermark channel, and different distortion regimes create different capacity-detectability frontiers.

The basic channel model

Let $\mathcal X$ be the vocabulary. At time $t$ , the base LLM gives

$Q_t \in \Delta(\mathcal X),$

a next-token distribution conditioned on the previous context. A multi-bit watermark has:

$M \in \{0,1\}^k$

as the message, and a secret key or side information $S_t$ . The encoder maps $(M,S_t)$ into a state

$Z_t = f_t(M,S_t),$

then samples

$X_t \sim P_{X_t|Z_t=z}.$

The key design constraint is that each conditional law $P_{X_t|Z_t=z}$ should remain close to the base law $Q_t$ . To quantify closeness, one could use total variation distance: $d_{\mathrm{TV}}(P,Q) = \frac12 \sum_{x \in \mathcal X} |P(x)-Q(x)|$ . A watermarked distribution is called uniformly per-key $\varepsilon$ -distortion-free when, for every time, message, and realized key, the conditional token distribution stays within $\varepsilon$ total variation of the base distribution.

This definition matters because total variation has an operational meaning: it is exactly the largest possible distinguishing advantage of any detector trying to decide whether a sample came from $P$ or $Q$ .

Total variation is maximum distinguishing advantage

Let $P$ and $Q$ be distributions on a finite alphabet $\mathcal X$ . A detector is a function $\phi:\mathcal X \to \{0,1\}$ . Its distinguishing advantage is $\Pr_{X\sim P}[\phi(X)=1] - \Pr_{X\sim Q}[\phi(X)=1]$ .

For a fixed detector,

$\mathbb E_P[\phi(X)]-\mathbb E_Q[\phi(X)] = \sum_x \phi(x)(P(x)-Q(x)).$

Let

$A = \{x : P(x) \ge Q(x)\}.$

Since $0 \le \phi(x) \le 1$ , $\sum_x \phi(x)(P(x)-Q(x)) \le \sum_{x\in A} (P(x)-Q(x)).$

But $\sum_{x\in A} (P(x)-Q(x)) = \frac12 \sum_x |P(x)-Q(x)| = d_{\mathrm{TV}}(P,Q).$

The upper bound is achieved by the detector $\phi^\star(x)=1\{x\in A\}$ . Taking the absolute value allows either $P$ or $Q$ to be the larger distribution on the chosen set. Therefore, $\sup_\phi \left| \Pr_{P}[\phi(X)=1]-\Pr_Q[\phi(X)=1]\right| = d_{\mathrm{TV}}(P,Q).$

So a per-token TV budget is a bound on the best possible one-token test. In our work, we have been using the TV budget to implement watermarking schemes and test the detectability and message-recoverability of such schemes.

How this relates to information hiding, data hiding, and steganography

The terminology across communities is inconsistent, but the underlying mathematical formulation is (fairly) stable.

Information hiding is the broad umbrella. There is a host object, a hidden message, a distortion or detectability constraint, and a receiver. The goal is reliable communication through the host.

Data hiding often emphasizes embedding payload bits into media. The fidelity constraint may be perceptual: image distortion, audio quality, semantic preservation, or edit distance.

Digital watermarking often emphasizes provenance, ownership, authentication, or tracing. The hidden message may be a copyright mark, model identifier, user identifier, or policy tag. Robustness against attacks is often central.

Steganography emphasizes secrecy of the very existence of communication. The adversary’s detection problem is primary. Cachin’s model, with a passive adversary distinguishing cover from stego distributions, is especially close to modern distributional formulations of LLM watermark stealth.

Multi-bit LLM watermarking sits at the intersection. It is data hiding because it embeds a payload. It is watermarking because the payload is usually provenance or attribution metadata. It is steganography when the watermarked output must be statistically indistinguishable from ordinary model output. It is channel coding because reliable recovery is governed by mutual information and error-correcting codes.

Our work explicitly connects LLM watermarking to the information-theoretic literature on watermarking, data hiding, and steganography, noting that classical work models watermarking as communication over a constrained channel and that the LLM setting replaces perceptual host distortion with distributional constraints on the base next-token law.

The central frontier

In my opinion, a good multi-bit watermark should satisfy at least three of these:

Payload: many bits per token.
Reliability: low message error after generation and possible edits.
Distortion-freeness/Stealth/Low-detectability: small statistical distance from ordinary model output.
Utility: low degradation of fluency, factuality, reasoning, and user experience.

Core results in information theory imply that these cannot all be maximized simultaneously. Stronger distortion-free constraints reduce mutual information. More robustness requires redundancy. More redundancy lowers rate. More aggressive perturbations improve decoding but increase detectability and may degrade quality.

The right question is not “Can we make a perfect multi-bit watermark?” It is: What is the best achievable rate at a given detectability, robustness, and quality budget?

That is a capacity question!

References

[1] Pierre Moulin and Joseph A. O’Sullivan. Information-theoretic analysis of information hiding. IEEE Trans. Inf. Theory, 49(3):563–593, 2003.

[2] Christian Cachin. An information-theoretic model for steganography. Information and Computation, 192(1):41–56, 2004.

[3] B. Chen and G. W. Wornell. Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Trans. Inf. Theory, 47(4):1423–1443, September 2006.

[4] John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning, 2023.

[5] Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, Jamie Hayes, Nidhi Vyas, Majd Merey, Jonah Brown-Cohen, Rudy Bunel, Borja Balle, Ali Cemgil, Zahra Ahmed, Kitty Stacpoole, and Pushmeet Kohli. Scalable watermarking for identifying large language model outputs. Nature, 634:818–823, 2024.

Auditing Differential Privacy via Donsker-Varadhan Representations

tl;dr One of my graduating students (Benjamin D. Kim) will be presenting a chapter of his M.S. thesis he completed at the University of Illinois at Urbana-Champaign (UIUC)! This work, on DP Rényi auditing, will appear at the TPDP 2026 workshop (and again at an ICML workshop this summer). Ben begins his Ph.D. at MIT EECS in the fall. Wishing him the best of luck as he begins a new journey. Here is the arxiv posting of the paper.

Differential privacy has become one of the dominant frameworks for protecting sensitive information in machine learning. In its ideal form, a differentially private algorithm guarantees that the presence or absence of any single person’s data has only a limited effect on the distribution of outputs. This is a powerful promise: even if an adversary sees the trained model, the released statistics, or some downstream prediction, they should not be able to infer too much about any one individual in the training set.

But as differentially private machine learning systems move from theory into practice, a natural question arises:

How do we know that a system claiming differential privacy is actually private?

That question is the starting point of our paper. The paper develops a black-box auditing framework for machine learning algorithms that claim Rényi differential privacy (RDP). The key idea is to treat privacy auditing as a statistical estimation problem: run the mechanism on neighboring datasets, observe its outputs, and estimate the Rényi divergence between the resulting output distributions. The paper’s main technical contribution is to do this using the Donsker–Varadhan variational representation of Rényi divergence, implemented with neural estimators inspired by MINE, and to prove finite-sample confidence guarantees and matching minimax lower bounds.

The paper contributes to the literature (e.g., see [3], [4], [5]) that moves privacy auditing closer to the role that cryptanalysis plays in cryptography: a necessary discipline for stress-testing, validating, and understanding the real security of deployed systems.

Why differential privacy needs auditing

A differential privacy guarantee is a theorem/lemma about a (randomized) mechanism. If an algorithm is correctly implemented, if the analysis is tight, if the accounting is correct, if the randomness is generated properly, and if all modeling assumptions are satisfied, then the promised privacy parameter should hold.

That is a lot of “ifs.”

In modern private machine learning, the most common algorithmic workhorse is DP-SGD: differentially private stochastic gradient descent. DP-SGD clips per-example gradients, adds Gaussian noise, and composes privacy loss over many training steps. In practice, privacy accounting is often expressed using Rényi differential privacy because RDP composes cleanly and is central to modern privacy accountants. The paper emphasizes that DP-SGD and related private learning systems are routinely deployed with RDP guarantees because of these tight composition properties.

However, the existence of a theoretical privacy analysis does not eliminate the need for empirical validation. Auditing matters for at least four reasons.

First, implementations can be wrong. Gradient clipping may be misapplied, random seeds may be mishandled, batching may differ from the analyzed model, or the privacy accountant may be used incorrectly. A privacy theorem protects the algorithm as specified, not necessarily the code that is actually deployed.

Second, privacy analyses can be loose. An algorithm may satisfy a formal upper bound, but the true privacy leakage may be much smaller or, in some cases, larger than expected under a flawed analysis. Auditing gives empirical lower bounds on the true leakage and helps assess whether the accounting is informative.

Third, privacy claims need external validation. In deployed systems, users, regulators, and scientific reviewers may not be satisfied with a claimed value of $\varepsilon$ . They may want evidence that the implementation behaves as advertised.

Fourth, auditing can reveal the operational meaning of privacy parameters. Even when a theoretical guarantee is correct, it may be difficult to interpret. Empirical attacks and audits help translate abstract divergence bounds into observable distinguishability.

This is why our paper frames privacy auditing as the counterpart to privacy accounting. Privacy accounting gives an upper bound: “the mechanism should leak at most this much.” Privacy auditing gives a lower bound: “we can empirically demonstrate at least this much leakage.” A good auditing method narrows the gap between these two quantities.

DP auditing as cryptanalysis for private machine learning

A useful analogy is with cryptography.

In cryptography, a proposed encryption scheme, signature scheme, or zero-knowledge protocol is not considered trustworthy merely because its designers believe it is secure. The community tries to break it. Cryptanalysts look for distinguishing attacks, key-recovery attacks, side-channel attacks, malleability attacks, and implementation vulnerabilities. A failed attack does not prove security, but strong cryptanalysis increases confidence; a successful attack exposes a gap between the claimed security and the actual behavior.

DP auditing plays a similar role for privacy-preserving machine learning.

A claimed DP mechanism is like a cryptographic construction. Its privacy proof is like a security reduction or theorem. An audit is like an attack: the auditor tries to distinguish whether a specific individual’s data was included in training. If the auditor can distinguish the “in” world from the “out” world too well, then the mechanism leaks more information than expected.

The analogy is especially clear in membership-inference audits. The attacker constructs two neighboring datasets: one containing a special record, often called a canary, and one without it. The mechanism is run many times on both datasets. The auditor then observes the outputs and tries to distinguish which dataset was used. If the output distributions are far apart, the canary has left a detectable trace.

But there is also an important difference between cryptanalysis and DP auditing. In cryptography, the usual goal is to find any efficient adversary that violates a security definition. In DP auditing, especially in this paper, the goal is more quantitative: estimate a divergence between distributions and attach a statistically valid confidence statement to that estimate. The auditor is not merely saying “I found an attack.” The auditor is saying something closer to:

With high confidence, the Rényi divergence between the mechanism’s outputs on these neighboring datasets is at least this value.

That makes the audit directly comparable to an RDP claim. This is one of the central conceptual advances of the paper. Rather than auditing DP indirectly through a particular attack heuristic, the paper audits the same mathematical object that appears in the privacy definition: the Rényi divergence between neighboring output distributions.

From DP to Rényi DP

Pure/approximate differential privacy says that a randomized mechanism $M$ is $(\varepsilon,\delta)$ -DP if, for all neighboring datasets $D,D'$ and all measurable events $S$ ,

$Pr[M(D)\in S]\leq e^\varepsilon Pr[M(D')\in S]+\delta.$

Rényi differential privacy instead bounds the Rényi divergence between the output distributions:

$D_\alpha(M(D)\Vert M(D')) \leq \varepsilon_\alpha,$

for a Rényi order $\alpha>1$ . In our paper, we recall this definition and notes that RDP can be converted back into approximate DP using the standard conversion: if a mechanism satisfies $(\alpha,\varepsilon_\alpha)$ -RDP, then it also satisfies $(\varepsilon_\alpha+\log(1/\delta)/(\alpha-1),\delta)$ -DP.

RDP is particularly useful for machine learning because privacy loss accumulates over many iterations of training. DP-SGD may run for hundreds or thousands of steps. RDP gives a convenient and often tight way to account for this composition. That is why modern DP-SGD analyses often report privacy through an RDP accountant, even if the final result is converted into $(\varepsilon,\delta)$ -DP. But this creates a mismatch in the auditing literature. Many prior audits were designed around pure or approximate DP, membership inference, data poisoning, or hypothesis testing formulations. These are valuable, but they do not directly estimate the RDP quantity that modern privacy accountants actually track. The paper argues that this is a gap: if deployed systems claim RDP guarantees, then auditors should be able to audit RDP directly.

The black-box auditing setting

The paper focuses on black-box auditing. In this setting, the auditor does not inspect the internal gradients, random noise, or intermediate training trajectory. Instead, the auditor can choose inputs, run the training mechanism, and observe outputs or post-processed outputs.

For DP-SGD, the black-box audit proceeds roughly as follows: The auditor chooses a canary example $(x',y')$ . The mechanism is trained repeatedly on a dataset $D$ without the canary and on the neighboring dataset $D\cup{(x',y')}$ with the canary. After training, the auditor measures the loss of the trained model on the canary. This produces two empirical distributions of losses: one from canary-absent training and one from canary-present training. The audit then estimates the Rényi divergence between these two loss distributions. This is a natural black-box attack because if the model behaves differently on the canary depending on whether it was included in training, then the canary has influenced the learned model. In privacy terms, the output distributions $M(D)$ and $M(D')$ are distinguishable.

The paper also uses a technique from prior work: worst-case initialization [3]. DP-SGD’s privacy guarantee is unaffected by the choice of initial parameters before private training, so the auditor can choose initial parameters that make training more sensitive to the canary. The paper follows the method of crafting such initializations by pretraining on a separate part of the dataset. This increases the statistical power of the audit while remaining within a black-box threat model. Again, this is analogous to cryptanalysis: a cryptanalyst often chooses adversarial plaintexts, messages, or protocol inputs to expose weaknesses. Here, the privacy auditor chooses a canary and initialization that make the privacy loss easier to detect.

Why estimating Rényi divergence is hard

The central statistical problem is this:

Given samples from two unknown distributions $P$ and $Q$ , estimate or lower bound $D_\alpha(P\Vert Q)$ .

This is challenging because the distributions may be high-dimensional, implicit, and accessible only through samples. In machine learning, the output distribution of a randomized training algorithm may be a distribution over model parameters, predictions, losses, or other post-processed statistics. The density ratio $P(x)/Q(x)$ is not available. Direct plug-in estimation is usually impossible.

This is where variational representations become useful.

A variational representation rewrites a divergence as a supremum over functions. Instead of needing the density ratio explicitly, one searches over a class of critic functions (denoted by $T$ in our work) that try to distinguish samples from $P$ and $Q$ . If the critic class is rich enough, optimizing the variational objective recovers the divergence. If the critic class is restricted, the objective gives a lower bound. This leads to the philosophy behind neural divergence estimation: train a neural network critic to maximize a divergence objective.

The Donsker-Varadhan Representation

The Donsker-Varadhan variational formula [1][2] is a result that expresses certain information-theoretic quantities, most famously KL divergence, as a supremum over test functions. Our paper uses a Rényi-divergence analogue of this variational perspective.

The key insight is that the unknown density ratio is replaced by an optimization problem over functions. In practice, the auditor restricts to a neural network class. The resulting class-restricted objective is a lower bound on the full variational divergence, because the supremum is taken over a restricted class.

For auditing, this lower-bound property is a feature, not a bug. A privacy audit should be conservative. If the neural critic certifies a large lower bound on Rényi divergence, then the true divergence is at least that large, subject to statistical confidence corrections. Optimization failure may make the bound smaller, but it does not create a false privacy violation if the statistical certificate is valid.

MINE and neural estimation of information leakage

MINE (Mutual Information Neural Estimation) [6], popularized the use of neural networks to estimate information-theoretic quantities through variational objectives. In MINE, a neural critic is trained to distinguish samples from a joint distribution from samples from the product of marginals, thereby estimating mutual information.

Our paper relies on this neural-estimation technique for privacy auditing. Instead of estimating mutual information, the auditor estimates a variational Rényi divergence between two loss distributions: the distribution obtained when the canary is absent and the distribution obtained when the canary is present.

There are two important implementation details.

First, the objective involves exponentials of the critic output. This can create high variance, especially for larger Rényi orders $\alpha$ . Second, the expectations in the objective must be approximated using minibatches. Naive minibatch estimates can be unstable because exponential moments are sensitive to outliers. To address this, the paper follows the MINE approach of using minibatching together with an exponential moving average, or EMA. EMA stabilizes the stochastic gradients by smoothing estimates of the exponential terms across minibatches. The paper explicitly notes that this is especially helpful for larger Rényi orders, although variants of this estimator can still have high mean-squared error at larger $\alpha$ . For this reason, the experiments focus on $\alpha\in(1,2]$ , where the estimator is more reliable.

How this improves on previous DP auditing work

Earlier DP auditing work used data poisoning attacks to audit DP-SGD and showed that empirical lower bounds on privacy parameters could exceed naive theoretical analysis. Later work improved the tightness of audits, reduced the number of required training runs, or adapted to different threat models. Some white-box audits exploit knowledge of the DP mechanism and access to intermediate information, while more recent work has explored one-run auditing through connections between DP and generalization. (Our paper’s improvement is not simply that it gets better empirical numbers, although it often does. The deeper improvement is that it changes the object being audited.)

Prior methods often focus on pure or approximate DP, membership inference, data poisoning, or attack-specific distinguishers. These are powerful but indirect for systems whose guarantees are expressed through RDP. This paper directly audits Rényi divergence, the quantity appearing in the RDP definition.

Other works (based on DP violation detection and DP-Finder methods), search for counterexamples or privacy violations. These tools are useful for debugging and falsifying incorrect implementations. But the paper argues that they are not designed to provide tight, sample-valid confidence bounds for correct algorithms, nor do they address RDP auditing with optimality guarantees.

The paper’s contribution can therefore be summarized in three improvements:

Directness: it audits RDP by estimating Rényi divergence, rather than auditing another privacy notion and converting indirectly.
Statistical validity: it gives explicit non-asymptotic confidence intervals, separating estimation error from true algorithmic privacy leakage.
Optimality: it proves minimax lower bounds showing that the sample-complexity guarantees are essentially optimal up to logarithmic factors.

This combination is what makes the result more than another empirical attack. We establish a statistical theory of black-box RDP auditing.

Experimental findings

The empirical section evaluates the auditing method on DP-SGD for image classification tasks, including MNIST and CIFAR-10. The auditor collects 500 loss observations for each canary-in and canary-out condition, then estimates Rényi divergence using the DV-Rényi model. The experiments compare the resulting black-box RDP audits against a prior state-of-the-art black-box auditing method, with conversions among $\mu$ -GDP, $(\varepsilon,\delta)$ -DP, and RDP where needed. The results show strong gains at small and moderate Rényi orders, especially $\alpha=1.25$ and $\alpha=2$ . For example, at $\alpha=1.25$ , the DV-Rényi auditor improves over the prior black-box baseline across the reported MNIST and CIFAR-10 privacy regimes. At $\alpha=2$ , the method also improves in many low and moderate privacy regimes, though the results show that performance can degrade at larger target privacy levels, consistent with the known instability of exponential-moment estimators at larger orders or larger divergences.

Our discussion emphasizes that these improvements come from directly auditing Rényi divergence rather than relying on indirect privacy conversions or attack-specific heuristics. The use of worst-case initialization increases the canary’s influence on training dynamics while preserving the validity of the DP-SGD privacy guarantee. The empirical story is therefore aligned with the theory: the auditor is powerful because it estimates the right quantity. (And it is principled because the estimate comes with finite-sample guarantees.)

Open questions

Our paper closes by mentioning several future directions, including extending optimal Rényi auditing guarantees to interactive and distributed settings, and exploring alternative variational formulas with lower variance for larger $\alpha>1$ .

Ben has already started thinking about some of these directions, especially this one:

Can we apply our techniques to audit distributed and federated private learning?

Our focuses on black-box auditing of DP-SGD in a centralized setting. But many privacy-sensitive systems are distributed: federated learning, secure aggregation, data markets, collaborative analytics, and multi-party training. In such systems, privacy leakage may arise not only from the final model but also from messages, participation patterns, aggregation protocols, or side information. Extending optimal RDP auditing to distributed and interactive mechanisms would significantly broaden the applicability of the framework.

References

[1] Monroe Donsker and S. R. Srinivasa Varadhan. Asymptotic evaluation of certain markov
process expectations for large time, IV. Communications on Pure and Applied Mathematics,
36(2):183–212, 1983.

[2] Venkat Anantharam. A variational characterization of rényi divergences. IEEE Transactions on
Information Theory, 64(11):6979–6989, 2018.

[3] Meenatchi Sundaram Muthu Selva Annamalai and Emiliano De Cristofaro. Nearly tight
black-box auditing of differentially private machine learning. Advances in Neural Information
Processing Systems, 37:131482–131502, 2024.

[4] Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski,
Nicholas Carlini, and Andreas Terzis. Tight auditing of differentially private machine learning.
In 32nd USENIX Security Symposium (USENIX Security 23), pages 1631–1648, 2023.

[5] Thomas Steinke, Milad Nasr, and Matthew Jagielski. Privacy auditing with one (1) training run.
In Advances in Neural Information Processing Systems (NeurIPS), volume 36, 2023.

[6] Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio,
Aaron Courville, and Devon Hjelm. Mutual information neural estimation. In Proceedings of
the 35th International Conference on Machine Learning (ICML). PMLR, 2018

The Existence of Error-Correcting Codes Implies Privacy Lower Bounds

Excited to share this recent IEEE BITS article in the Special Issue on Error-Correcting Codes [1].

Abstract

We discuss how the lens of error-correcting codes yields lower bounds on the privacy-utility tradeoff in differential privacy (DP). Reconstruction attacks, packing and covering arguments, and fingerprinting codes can all be interpreted as coding-theoretic tools: they allow for analysis of families of datasets whose query-answer patterns form (possibly randomized) codewords with large pairwise distance. Any DP mechanism answering these queries too accurately reveals enough structure to enable “decoding”; that is, identifying a user or reconstructing large parts of the dataset. By presenting each lower-bound technique through an explicit coding viewpoint, this survey unifies classical results on counting queries with recent advances in statistical estimation and high-dimensional learning, including Gaussian covariance estimation. We conclude with open problems at the intersection of coding theory and differential privacy.

[1] https://ieeexplore.ieee.org/document/11481645

Why (and How) Things Work

In Honor of David Blackwell

Author Archives for Daniel Alabi

Multi-Bit Watermarking

From zero-bit detection to multi-bit communication

The basic channel model

Total variation is maximum distinguishing advantage

How this relates to information hiding, data hiding, and steganography

The central frontier

References

Auditing Differential Privacy via Donsker-Varadhan Representations

Why differential privacy needs auditing

DP auditing as cryptanalysis for private machine learning

From DP to Rényi DP

The black-box auditing setting

Why estimating Rényi divergence is hard

The Donsker-Varadhan Representation

MINE and neural estimation of information leakage

How this improves on previous DP auditing work

Experimental findings

Open questions

Can we apply our techniques to audit distributed and federated private learning?

References

The Existence of Error-Correcting Codes Implies Privacy Lower Bounds

Abstract