Multi-Bit Watermarking

tl;dr Recently, my research group and I have been collaborating with scientists at Amazon to design, implement, and test multi-bit watermarking schemes. We will likely release our pre-print soon. But until then, in this blog post, I present some thoughts on multi-bit text watermarking.

Text watermarking can be introduced as a digital provenance tool: a model provider slightly changes generation so that later, with the right detector, the model provider can tell (preferably with overwhelmingly high confidence) whether some generated text came from its model. That is the zero-bit version: the hidden payload or message is just “watermark present.” The more interesting version is multi-bit watermarking, where the generated text carries a message: a user ID, model ID, timestamp, content-policy tag, licensing tag, or audit trail.

That extra payload is useful, but it creates a fundamental tension. The process of embedding more bits requires the output distribution to depend more strongly on the hidden message; stronger dependence makes decoding easier, but also makes the text easier to distinguish from ordinary model output. Our latest work (to be released soon!) states this tradeoff directly: higher payloads tend to increase detectability, while stronger distortion-free requirements reduce achievable rates.

This is not a brand new conceptual problem. It is the LLM version of a much older question in information hiding, data hiding, steganography, and digital watermarking: how many bits can be hidden in a host object while preserving some notion of distributional fidelity, stealth, or robustness? Moulin and O’Sullivan’s information-theoretic analysis describes information hiding as hiding information in a host data set so that it can be reliably communicated to a receiver, covering watermarking, fingerprinting, steganography, and data embedding [1]. Cachin’s information-theoretic steganography model casts the adversary’s task as a hypothesis test between innocent cover messages and stego messages, with security quantified through distributional divergence [2]. Chen and Wornell’s quantization-index modulation work studies embedding a signal, such as a digital watermark, inside a host signal to form a composite signal, with provable rate-distortion-robustness behavior [3].

The LLM twist is that the “host” is not a fixed image, audio clip, or document. It is a next-token distribution. At each generation step, the base model gives a distribution ( $Q_t$ ) over the vocabulary, and the watermarker chooses a nearby distribution ( $Q_t^\star$ ) from which it samples. This turns multi-bit text watermarking into a channel coding problem with a distributional constraint.

From zero-bit detection to multi-bit communication

A zero-bit watermark asks:

$H_0: X_{1:n} \sim$ ordinary model versus $H_1: X_{1:n} \sim$ watermarked model.

A multi-bit watermark asks a stronger question about the following relation:

$M \in \{0,1\}^k \quad \longrightarrow \quad X_{1:n} \quad \longrightarrow \quad \widehat M,$

where $M$ is the hidden message, $X_{1:n}$ is the generated text, and $\widehat M$ is the decoded message. The receiver may know a secret key, a public codebook, and perhaps the base model. The adversary may try to detect, remove, forge, or corrupt the watermark.

Early LLM watermarking work focused largely on zero-bit detection. Kirchenbauer et al.’s watermark, for example, randomly selects “green” tokens before generation and softly promotes them during sampling, giving an efficient statistical detector [4]. SynthID-Text is another prominent watermarking system for LLM outputs, described as preserving text quality while enabling efficient detection with minimal latency overhead. [5] Multi-bit schemes move beyond detection into payload extraction; for example, multi-bit text watermarking methods have been proposed for traceability, robust extraction, and paraphrase-resilient embedding.

Our work frames this shift using an information-theoretic channel view: reliable message recovery is governed by the conditional mutual information of the induced watermark channel, and different distortion regimes create different capacity-detectability frontiers.

The basic channel model

Let $\mathcal X$ be the vocabulary. At time $t$ , the base LLM gives

$Q_t \in \Delta(\mathcal X),$

a next-token distribution conditioned on the previous context. A multi-bit watermark has:

$M \in \{0,1\}^k$

as the message, and a secret key or side information $S_t$ . The encoder maps $(M,S_t)$ into a state

$Z_t = f_t(M,S_t),$

then samples

$X_t \sim P_{X_t|Z_t=z}.$

The key design constraint is that each conditional law $P_{X_t|Z_t=z}$ should remain close to the base law $Q_t$ . To quantify closeness, one could use total variation distance: $d_{\mathrm{TV}}(P,Q) = \frac12 \sum_{x \in \mathcal X} |P(x)-Q(x)|$ . A watermarked distribution is called uniformly per-key $\varepsilon$ -distortion-free when, for every time, message, and realized key, the conditional token distribution stays within $\varepsilon$ total variation of the base distribution.

This definition matters because total variation has an operational meaning: it is exactly the largest possible distinguishing advantage of any detector trying to decide whether a sample came from $P$ or $Q$ .

Total variation is maximum distinguishing advantage

Let $P$ and $Q$ be distributions on a finite alphabet $\mathcal X$ . A detector is a function $\phi:\mathcal X \to \{0,1\}$ . Its distinguishing advantage is $\Pr_{X\sim P}[\phi(X)=1] - \Pr_{X\sim Q}[\phi(X)=1]$ .

For a fixed detector,

$\mathbb E_P[\phi(X)]-\mathbb E_Q[\phi(X)] = \sum_x \phi(x)(P(x)-Q(x)).$

Let

$A = \{x : P(x) \ge Q(x)\}.$

Since $0 \le \phi(x) \le 1$ , $\sum_x \phi(x)(P(x)-Q(x)) \le \sum_{x\in A} (P(x)-Q(x)).$

But $\sum_{x\in A} (P(x)-Q(x)) = \frac12 \sum_x |P(x)-Q(x)| = d_{\mathrm{TV}}(P,Q).$

The upper bound is achieved by the detector $\phi^\star(x)=1\{x\in A\}$ . Taking the absolute value allows either $P$ or $Q$ to be the larger distribution on the chosen set. Therefore, $\sup_\phi \left| \Pr_{P}[\phi(X)=1]-\Pr_Q[\phi(X)=1]\right| = d_{\mathrm{TV}}(P,Q).$

So a per-token TV budget is a bound on the best possible one-token test. In our work, we have been using the TV budget to implement watermarking schemes and test the detectability and message-recoverability of such schemes.

How this relates to information hiding, data hiding, and steganography

The terminology across communities is inconsistent, but the underlying mathematical formulation is (fairly) stable.

Information hiding is the broad umbrella. There is a host object, a hidden message, a distortion or detectability constraint, and a receiver. The goal is reliable communication through the host.

Data hiding often emphasizes embedding payload bits into media. The fidelity constraint may be perceptual: image distortion, audio quality, semantic preservation, or edit distance.

Digital watermarking often emphasizes provenance, ownership, authentication, or tracing. The hidden message may be a copyright mark, model identifier, user identifier, or policy tag. Robustness against attacks is often central.

Steganography emphasizes secrecy of the very existence of communication. The adversary’s detection problem is primary. Cachin’s model, with a passive adversary distinguishing cover from stego distributions, is especially close to modern distributional formulations of LLM watermark stealth.

Multi-bit LLM watermarking sits at the intersection. It is data hiding because it embeds a payload. It is watermarking because the payload is usually provenance or attribution metadata. It is steganography when the watermarked output must be statistically indistinguishable from ordinary model output. It is channel coding because reliable recovery is governed by mutual information and error-correcting codes.

Our work explicitly connects LLM watermarking to the information-theoretic literature on watermarking, data hiding, and steganography, noting that classical work models watermarking as communication over a constrained channel and that the LLM setting replaces perceptual host distortion with distributional constraints on the base next-token law.

The central frontier

In my opinion, a good multi-bit watermark should satisfy at least three of these:

Payload: many bits per token.
Reliability: low message error after generation and possible edits.
Distortion-freeness/Stealth/Low-detectability: small statistical distance from ordinary model output.
Utility: low degradation of fluency, factuality, reasoning, and user experience.

Core results in information theory imply that these cannot all be maximized simultaneously. Stronger distortion-free constraints reduce mutual information. More robustness requires redundancy. More redundancy lowers rate. More aggressive perturbations improve decoding but increase detectability and may degrade quality.

The right question is not “Can we make a perfect multi-bit watermark?” It is: What is the best achievable rate at a given detectability, robustness, and quality budget?

That is a capacity question!

References

[1] Pierre Moulin and Joseph A. O’Sullivan. Information-theoretic analysis of information hiding. IEEE Trans. Inf. Theory, 49(3):563–593, 2003.

[2] Christian Cachin. An information-theoretic model for steganography. Information and Computation, 192(1):41–56, 2004.

[3] B. Chen and G. W. Wornell. Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Trans. Inf. Theory, 47(4):1423–1443, September 2006.

[4] John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning, 2023.

[5] Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, Jamie Hayes, Nidhi Vyas, Majd Merey, Jonah Brown-Cohen, Rudy Bunel, Borja Balle, Ali Cemgil, Zahra Ahmed, Kitty Stacpoole, and Pushmeet Kohli. Scalable watermarking for identifying large language model outputs. Nature, 634:818–823, 2024.

Auditing Differential Privacy via Donsker-Varadhan Representations

tl;dr One of my graduating students (Benjamin D. Kim) will be presenting a chapter of his M.S. thesis he completed at the University of Illinois at Urbana-Champaign (UIUC)! This work, on DP Rényi auditing, will appear at the TPDP 2026 workshop (and again at an ICML workshop this summer). Ben begins his Ph.D. at MIT EECS in the fall. Wishing him the best of luck as he begins a new journey. Here is the arxiv posting of the paper.

Differential privacy has become one of the dominant frameworks for protecting sensitive information in machine learning. In its ideal form, a differentially private algorithm guarantees that the presence or absence of any single person’s data has only a limited effect on the distribution of outputs. This is a powerful promise: even if an adversary sees the trained model, the released statistics, or some downstream prediction, they should not be able to infer too much about any one individual in the training set.

But as differentially private machine learning systems move from theory into practice, a natural question arises:

How do we know that a system claiming differential privacy is actually private?

That question is the starting point of our paper. The paper develops a black-box auditing framework for machine learning algorithms that claim Rényi differential privacy (RDP). The key idea is to treat privacy auditing as a statistical estimation problem: run the mechanism on neighboring datasets, observe its outputs, and estimate the Rényi divergence between the resulting output distributions. The paper’s main technical contribution is to do this using the Donsker–Varadhan variational representation of Rényi divergence, implemented with neural estimators inspired by MINE, and to prove finite-sample confidence guarantees and matching minimax lower bounds.

The paper contributes to the literature (e.g., see [3], [4], [5]) that moves privacy auditing closer to the role that cryptanalysis plays in cryptography: a necessary discipline for stress-testing, validating, and understanding the real security of deployed systems.

Why differential privacy needs auditing

A differential privacy guarantee is a theorem/lemma about a (randomized) mechanism. If an algorithm is correctly implemented, if the analysis is tight, if the accounting is correct, if the randomness is generated properly, and if all modeling assumptions are satisfied, then the promised privacy parameter should hold.

That is a lot of “ifs.”

In modern private machine learning, the most common algorithmic workhorse is DP-SGD: differentially private stochastic gradient descent. DP-SGD clips per-example gradients, adds Gaussian noise, and composes privacy loss over many training steps. In practice, privacy accounting is often expressed using Rényi differential privacy because RDP composes cleanly and is central to modern privacy accountants. The paper emphasizes that DP-SGD and related private learning systems are routinely deployed with RDP guarantees because of these tight composition properties.

However, the existence of a theoretical privacy analysis does not eliminate the need for empirical validation. Auditing matters for at least four reasons.

First, implementations can be wrong. Gradient clipping may be misapplied, random seeds may be mishandled, batching may differ from the analyzed model, or the privacy accountant may be used incorrectly. A privacy theorem protects the algorithm as specified, not necessarily the code that is actually deployed.

Second, privacy analyses can be loose. An algorithm may satisfy a formal upper bound, but the true privacy leakage may be much smaller or, in some cases, larger than expected under a flawed analysis. Auditing gives empirical lower bounds on the true leakage and helps assess whether the accounting is informative.

Third, privacy claims need external validation. In deployed systems, users, regulators, and scientific reviewers may not be satisfied with a claimed value of $\varepsilon$ . They may want evidence that the implementation behaves as advertised.

Fourth, auditing can reveal the operational meaning of privacy parameters. Even when a theoretical guarantee is correct, it may be difficult to interpret. Empirical attacks and audits help translate abstract divergence bounds into observable distinguishability.

This is why our paper frames privacy auditing as the counterpart to privacy accounting. Privacy accounting gives an upper bound: “the mechanism should leak at most this much.” Privacy auditing gives a lower bound: “we can empirically demonstrate at least this much leakage.” A good auditing method narrows the gap between these two quantities.

DP auditing as cryptanalysis for private machine learning

A useful analogy is with cryptography.

In cryptography, a proposed encryption scheme, signature scheme, or zero-knowledge protocol is not considered trustworthy merely because its designers believe it is secure. The community tries to break it. Cryptanalysts look for distinguishing attacks, key-recovery attacks, side-channel attacks, malleability attacks, and implementation vulnerabilities. A failed attack does not prove security, but strong cryptanalysis increases confidence; a successful attack exposes a gap between the claimed security and the actual behavior.

DP auditing plays a similar role for privacy-preserving machine learning.

A claimed DP mechanism is like a cryptographic construction. Its privacy proof is like a security reduction or theorem. An audit is like an attack: the auditor tries to distinguish whether a specific individual’s data was included in training. If the auditor can distinguish the “in” world from the “out” world too well, then the mechanism leaks more information than expected.

The analogy is especially clear in membership-inference audits. The attacker constructs two neighboring datasets: one containing a special record, often called a canary, and one without it. The mechanism is run many times on both datasets. The auditor then observes the outputs and tries to distinguish which dataset was used. If the output distributions are far apart, the canary has left a detectable trace.

But there is also an important difference between cryptanalysis and DP auditing. In cryptography, the usual goal is to find any efficient adversary that violates a security definition. In DP auditing, especially in this paper, the goal is more quantitative: estimate a divergence between distributions and attach a statistically valid confidence statement to that estimate. The auditor is not merely saying “I found an attack.” The auditor is saying something closer to:

With high confidence, the Rényi divergence between the mechanism’s outputs on these neighboring datasets is at least this value.

That makes the audit directly comparable to an RDP claim. This is one of the central conceptual advances of the paper. Rather than auditing DP indirectly through a particular attack heuristic, the paper audits the same mathematical object that appears in the privacy definition: the Rényi divergence between neighboring output distributions.

From DP to Rényi DP

Pure/approximate differential privacy says that a randomized mechanism $M$ is $(\varepsilon,\delta)$ -DP if, for all neighboring datasets $D,D'$ and all measurable events $S$ ,

$Pr[M(D)\in S]\leq e^\varepsilon Pr[M(D')\in S]+\delta.$

Rényi differential privacy instead bounds the Rényi divergence between the output distributions:

$D_\alpha(M(D)\Vert M(D')) \leq \varepsilon_\alpha,$

for a Rényi order $\alpha>1$ . In our paper, we recall this definition and notes that RDP can be converted back into approximate DP using the standard conversion: if a mechanism satisfies $(\alpha,\varepsilon_\alpha)$ -RDP, then it also satisfies $(\varepsilon_\alpha+\log(1/\delta)/(\alpha-1),\delta)$ -DP.

RDP is particularly useful for machine learning because privacy loss accumulates over many iterations of training. DP-SGD may run for hundreds or thousands of steps. RDP gives a convenient and often tight way to account for this composition. That is why modern DP-SGD analyses often report privacy through an RDP accountant, even if the final result is converted into $(\varepsilon,\delta)$ -DP. But this creates a mismatch in the auditing literature. Many prior audits were designed around pure or approximate DP, membership inference, data poisoning, or hypothesis testing formulations. These are valuable, but they do not directly estimate the RDP quantity that modern privacy accountants actually track. The paper argues that this is a gap: if deployed systems claim RDP guarantees, then auditors should be able to audit RDP directly.

The black-box auditing setting

The paper focuses on black-box auditing. In this setting, the auditor does not inspect the internal gradients, random noise, or intermediate training trajectory. Instead, the auditor can choose inputs, run the training mechanism, and observe outputs or post-processed outputs.

For DP-SGD, the black-box audit proceeds roughly as follows: The auditor chooses a canary example $(x',y')$ . The mechanism is trained repeatedly on a dataset $D$ without the canary and on the neighboring dataset $D\cup{(x',y')}$ with the canary. After training, the auditor measures the loss of the trained model on the canary. This produces two empirical distributions of losses: one from canary-absent training and one from canary-present training. The audit then estimates the Rényi divergence between these two loss distributions. This is a natural black-box attack because if the model behaves differently on the canary depending on whether it was included in training, then the canary has influenced the learned model. In privacy terms, the output distributions $M(D)$ and $M(D')$ are distinguishable.

The paper also uses a technique from prior work: worst-case initialization [3]. DP-SGD’s privacy guarantee is unaffected by the choice of initial parameters before private training, so the auditor can choose initial parameters that make training more sensitive to the canary. The paper follows the method of crafting such initializations by pretraining on a separate part of the dataset. This increases the statistical power of the audit while remaining within a black-box threat model. Again, this is analogous to cryptanalysis: a cryptanalyst often chooses adversarial plaintexts, messages, or protocol inputs to expose weaknesses. Here, the privacy auditor chooses a canary and initialization that make the privacy loss easier to detect.

Why estimating Rényi divergence is hard

The central statistical problem is this:

Given samples from two unknown distributions $P$ and $Q$ , estimate or lower bound $D_\alpha(P\Vert Q)$ .

This is challenging because the distributions may be high-dimensional, implicit, and accessible only through samples. In machine learning, the output distribution of a randomized training algorithm may be a distribution over model parameters, predictions, losses, or other post-processed statistics. The density ratio $P(x)/Q(x)$ is not available. Direct plug-in estimation is usually impossible.

This is where variational representations become useful.

A variational representation rewrites a divergence as a supremum over functions. Instead of needing the density ratio explicitly, one searches over a class of critic functions (denoted by $T$ in our work) that try to distinguish samples from $P$ and $Q$ . If the critic class is rich enough, optimizing the variational objective recovers the divergence. If the critic class is restricted, the objective gives a lower bound. This leads to the philosophy behind neural divergence estimation: train a neural network critic to maximize a divergence objective.

The Donsker-Varadhan Representation

The Donsker-Varadhan variational formula [1][2] is a result that expresses certain information-theoretic quantities, most famously KL divergence, as a supremum over test functions. Our paper uses a Rényi-divergence analogue of this variational perspective.

The key insight is that the unknown density ratio is replaced by an optimization problem over functions. In practice, the auditor restricts to a neural network class. The resulting class-restricted objective is a lower bound on the full variational divergence, because the supremum is taken over a restricted class.

For auditing, this lower-bound property is a feature, not a bug. A privacy audit should be conservative. If the neural critic certifies a large lower bound on Rényi divergence, then the true divergence is at least that large, subject to statistical confidence corrections. Optimization failure may make the bound smaller, but it does not create a false privacy violation if the statistical certificate is valid.

MINE and neural estimation of information leakage

MINE (Mutual Information Neural Estimation) [6], popularized the use of neural networks to estimate information-theoretic quantities through variational objectives. In MINE, a neural critic is trained to distinguish samples from a joint distribution from samples from the product of marginals, thereby estimating mutual information.

Our paper relies on this neural-estimation technique for privacy auditing. Instead of estimating mutual information, the auditor estimates a variational Rényi divergence between two loss distributions: the distribution obtained when the canary is absent and the distribution obtained when the canary is present.

There are two important implementation details.

First, the objective involves exponentials of the critic output. This can create high variance, especially for larger Rényi orders $\alpha$ . Second, the expectations in the objective must be approximated using minibatches. Naive minibatch estimates can be unstable because exponential moments are sensitive to outliers. To address this, the paper follows the MINE approach of using minibatching together with an exponential moving average, or EMA. EMA stabilizes the stochastic gradients by smoothing estimates of the exponential terms across minibatches. The paper explicitly notes that this is especially helpful for larger Rényi orders, although variants of this estimator can still have high mean-squared error at larger $\alpha$ . For this reason, the experiments focus on $\alpha\in(1,2]$ , where the estimator is more reliable.

How this improves on previous DP auditing work

Earlier DP auditing work used data poisoning attacks to audit DP-SGD and showed that empirical lower bounds on privacy parameters could exceed naive theoretical analysis. Later work improved the tightness of audits, reduced the number of required training runs, or adapted to different threat models. Some white-box audits exploit knowledge of the DP mechanism and access to intermediate information, while more recent work has explored one-run auditing through connections between DP and generalization. (Our paper’s improvement is not simply that it gets better empirical numbers, although it often does. The deeper improvement is that it changes the object being audited.)

Prior methods often focus on pure or approximate DP, membership inference, data poisoning, or attack-specific distinguishers. These are powerful but indirect for systems whose guarantees are expressed through RDP. This paper directly audits Rényi divergence, the quantity appearing in the RDP definition.

Other works (based on DP violation detection and DP-Finder methods), search for counterexamples or privacy violations. These tools are useful for debugging and falsifying incorrect implementations. But the paper argues that they are not designed to provide tight, sample-valid confidence bounds for correct algorithms, nor do they address RDP auditing with optimality guarantees.

The paper’s contribution can therefore be summarized in three improvements:

Directness: it audits RDP by estimating Rényi divergence, rather than auditing another privacy notion and converting indirectly.
Statistical validity: it gives explicit non-asymptotic confidence intervals, separating estimation error from true algorithmic privacy leakage.
Optimality: it proves minimax lower bounds showing that the sample-complexity guarantees are essentially optimal up to logarithmic factors.

This combination is what makes the result more than another empirical attack. We establish a statistical theory of black-box RDP auditing.

Experimental findings

The empirical section evaluates the auditing method on DP-SGD for image classification tasks, including MNIST and CIFAR-10. The auditor collects 500 loss observations for each canary-in and canary-out condition, then estimates Rényi divergence using the DV-Rényi model. The experiments compare the resulting black-box RDP audits against a prior state-of-the-art black-box auditing method, with conversions among $\mu$ -GDP, $(\varepsilon,\delta)$ -DP, and RDP where needed. The results show strong gains at small and moderate Rényi orders, especially $\alpha=1.25$ and $\alpha=2$ . For example, at $\alpha=1.25$ , the DV-Rényi auditor improves over the prior black-box baseline across the reported MNIST and CIFAR-10 privacy regimes. At $\alpha=2$ , the method also improves in many low and moderate privacy regimes, though the results show that performance can degrade at larger target privacy levels, consistent with the known instability of exponential-moment estimators at larger orders or larger divergences.

Our discussion emphasizes that these improvements come from directly auditing Rényi divergence rather than relying on indirect privacy conversions or attack-specific heuristics. The use of worst-case initialization increases the canary’s influence on training dynamics while preserving the validity of the DP-SGD privacy guarantee. The empirical story is therefore aligned with the theory: the auditor is powerful because it estimates the right quantity. (And it is principled because the estimate comes with finite-sample guarantees.)

Open questions

Our paper closes by mentioning several future directions, including extending optimal Rényi auditing guarantees to interactive and distributed settings, and exploring alternative variational formulas with lower variance for larger $\alpha>1$ .

Ben has already started thinking about some of these directions, especially this one:

Can we apply our techniques to audit distributed and federated private learning?

Our focuses on black-box auditing of DP-SGD in a centralized setting. But many privacy-sensitive systems are distributed: federated learning, secure aggregation, data markets, collaborative analytics, and multi-party training. In such systems, privacy leakage may arise not only from the final model but also from messages, participation patterns, aggregation protocols, or side information. Extending optimal RDP auditing to distributed and interactive mechanisms would significantly broaden the applicability of the framework.

References

[1] Monroe Donsker and S. R. Srinivasa Varadhan. Asymptotic evaluation of certain markov
process expectations for large time, IV. Communications on Pure and Applied Mathematics,
36(2):183–212, 1983.

[2] Venkat Anantharam. A variational characterization of rényi divergences. IEEE Transactions on
Information Theory, 64(11):6979–6989, 2018.

[3] Meenatchi Sundaram Muthu Selva Annamalai and Emiliano De Cristofaro. Nearly tight
black-box auditing of differentially private machine learning. Advances in Neural Information
Processing Systems, 37:131482–131502, 2024.

[4] Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski,
Nicholas Carlini, and Andreas Terzis. Tight auditing of differentially private machine learning.
In 32nd USENIX Security Symposium (USENIX Security 23), pages 1631–1648, 2023.

[5] Thomas Steinke, Milad Nasr, and Matthew Jagielski. Privacy auditing with one (1) training run.
In Advances in Neural Information Processing Systems (NeurIPS), volume 36, 2023.

[6] Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio,
Aaron Courville, and Devon Hjelm. Mutual information neural estimation. In Proceedings of
the 35th International Conference on Machine Learning (ICML). PMLR, 2018

Secure Non-Interactive Reductions

A recurring theme in cryptography is that correlation is a resource. Alice and Bob may start with a source correlation $C$ , say noisy copies of a bit or erasures. The hope is to convert it into a more useful target correlation $D$ . What changes dramatically is whether they are allowed to talk, and whether we only care about correctness or also about security. The recent SNIR/SNIS line of work makes that distinction precise and surprisingly rigid.

This post explains that landscape in a way I hope is useful. The main message/tl;dr is:

ordinary non-secure non-interactive reductions ask only whether local maps can reproduce the right joint law;
secure non-interactive reductions ask for that plus privacy against each party;
interactive reductions are much more powerful because communication can reshape the parties’ views before the final local-output step.

Recently, I have also been working with colleagues in ECE and Physics on quantum secure non-interactive reductions, which aim to understand when one quantum correlation resource can be transformed into another quantum (or classical) resource without communication and while preserving the appropriate notion of security against each party. In the quantum setting, the picture is even more delicate: the relevant resources may be shared quantum states, measurements may disturb systems, side information may be entangled, and the right privacy conditions must account for quantum adversaries and quantum views rather than just classical random variables. Our framework for thinking about that problem relies heavily on the formalization of the classical or non-quantum version, which is the subject of this post. Put differently, before asking what secure non-interactive reducibility should mean for shared entangled states or quantum correlations, it is essential to first understand the classical theory: what ordinary non-interactive reductions allow, what secure non-interactive reductions forbid, and why interaction fundamentally changes the reducibility landscape. I will also prove a simple lemma showing that secure non-interactive reducibility is strictly stronger than ordinary non-interactive reducibility.

The three models

Fix a source correlation $C=(X,Y)$ and a target correlation $D=(U,V)$ .

Non-secure non-interactive reduction (NIR / NIS)

Alice gets $X^n$ , Bob gets $Y^n$ , they do no communication, and output $U’ = f(X^n), V’ = g(Y^n)$ , or randomized versions using local coins. The goal is for $(U, V)$ to be close in statistical distance (or total variation distance) to $(U’, V’)$ .

That is the classical non-interactive simulation perspective: only correctness of the output law matters. The SNIS paper explicitly contrasts this with SNIS, noting that NIS “only considers correctness (not security).”

Secure non-interactive reduction (SNIR / SNIS)

Now Alice and Bob still do not communicate, but in addition to correctness, the reduction must hide whatever the target correlation is supposed to hide. In the formulation used in the SNIS paper, a reduction from $(U,V)$ to $(X,Y)^{\otimes n}$ must satisfy:

Correctness: $(U’,V’)$ is close (in statistical distance) to $(U,V)$ .
Security against corrupt Alice: conditioned on the final outputs $(u,v)$ , Alice’s view should be close to depending only on $u$ , not on the extra value $v$ .
Security against corrupt Bob: symmetrically, Bob’s view should be close to depending only on $v$ , not on the extra value $u$ .

This is the key strengthening. In NIR, a party may silently carry “too much knowledge” about the other party’s output, as long as the final joint distribution looks right. In SNIR, that is disallowed.

Interactive secure reductions

Here Alice and Bob may exchange messages before producing outputs. An interactive secure reduction can be seen as having two phases. First, an interaction phase transforms the parties’ views. Second, a local derivation phase maps those final views to outputs, and the security condition applies to this derivation step.

That decomposition is conceptually important: SNIR isolates the “final local secure derivation” part while forbidding the interaction that might create the right intermediate views in the first place. This is why interactive reductions can be much stronger.

Why SNIR is the right cryptographic strengthening

NIR asks “can I simulate the right distribution?”
SNIR asks “can I simulate the right distribution without the wrong leakage?”

The SNIS paper states this very plainly: in NIS, “erasing information from parties’ views… is permissible,” but that may not be cryptographically secure. SNIS was introduced precisely to capture non-interactive secure conversion of one correlation into another. This matters because many correlations that are “equivalent enough” from a purely distributional viewpoint stop being interchangeable once one insists on simulation-based privacy.

SNIR is strictly stronger than non-secure NIR

Here is a simple lemma/example that captures the basic separation.

Lemma/example

There exist correlations $C$ and $D$ such that:

$D$ has a perfect non-secure non-interactive reduction from $C$ , but
$D$ has no perfect secure non-interactive reduction from $C$ .

Proof

Let $S,T$ be independent uniform bits.

Define the source correlation $C = ((S,T),\, T).$

So Alice receives the pair $(S,T)$ , while Bob receives only $T$ .

Define the target correlation $D = (S,T).$

Step 1: There is a perfect non-secure non-interactive reduction

Alice outputs $U’ := S,$

and Bob outputs $V’ := T.$

No communication is used. Since $S,T$ are independent uniform bits, the joint distribution of $(U’,V’)$ is exactly $(S,T)$ , which is exactly the target correlation $D$ .

So $D$ has a perfect NIR from $C$ .

Step 2: There is no perfect secure non-interactive reduction

Assume for contradiction that the above source-to-target conversion were a perfect SNIR.

In the target correlation $D=(S,T)$ , Alice’s output is $U=S$ and Bob’s output is $V=T$ . Since $S$ and $T$ are independent, a correct secure realization of $D$ should not let Alice’s view reveal Bob’s output $T$ beyond what is implied by her own output $S$ .

But in the source $C$ , Alice’s view is $X=(S,T).$

Conditioned on the target outputs $(U,V)=(s,t)$ , Alice’s view is deterministically $X=(s,t).$

Hence the conditional law of Alice’s view depends on $t$ . i.e., on Bob’s output $V$ , not merely on Alice’s own output $U=s$ .

This violates the security-against-Alice requirement for SNIR.

Therefore no perfect SNIR from $C$ to $D$ exists.

What this lemma implies

This toy example is simple, but it isolates the exact distinction.

In NIR, it is fine that Alice initially knows both $S$ and $T$ ; all we asked for was the right final law.
In SNIR, that extra knowledge is fatal, because Alice learns Bob’s target output in a way the target correlation itself does not permit.

Ordinary non-interactive reduction

“Can I locally compress, discard, relabel, or threshold my view so that the joint output law matches the target?”

This is fundamentally an information-theoretic simulation question.

Secure non-interactive reduction

“Can I do that without retaining forbidden side information about the other party’s target output?”

This is a simulation-based-privacy question.

Interactive reduction

“Can I first reengineer the two views using communication, and only then derive the target securely?”

This is why interaction is stronger.

References

[1] Agarwal, Narayanan, Pathak, Prabhakaran, Prabhakaran, Rehan, “Secure Non-Interactive Reduction and Spectral Analysis of Correlations” (Eurocrypt 2022 / ePrint 2022/262), especially the spectral criterion, mirroring perspective, and incompleteness results.

[2] Bhushan, Misra, Narayanan, Prabhakaran, “Secure Non-Interactive Reducibility is Decidable” (TCC 2022 / ePrint 2022/1457), for decidability of SNIR feasibility.

[3] Khorasgani, Maji, Nguyen, “Secure Non-interactive Simulation: Feasibility and Rate” (Eurocrypt 2022), for the simulation-based definition, comparison with NIS and OWSC, and exact rate/feasibility characterizations for BSS/BES families.

Why (and How) Things Work

In Honor of David Blackwell

Category Archives for All

Multi-Bit Watermarking

From zero-bit detection to multi-bit communication

The basic channel model

Total variation is maximum distinguishing advantage

How this relates to information hiding, data hiding, and steganography

The central frontier

References

Auditing Differential Privacy via Donsker-Varadhan Representations

Why differential privacy needs auditing

DP auditing as cryptanalysis for private machine learning

From DP to Rényi DP

The black-box auditing setting

Why estimating Rényi divergence is hard

The Donsker-Varadhan Representation

MINE and neural estimation of information leakage

How this improves on previous DP auditing work

Experimental findings

Open questions

Can we apply our techniques to audit distributed and federated private learning?

References

Secure Non-Interactive Reductions

The three models

Non-secure non-interactive reduction (NIR / NIS)

Secure non-interactive reduction (SNIR / SNIS)

Interactive secure reductions

Why SNIR is the right cryptographic strengthening

SNIR is strictly stronger than non-secure NIR

Lemma/example

Proof

Step 1: There is a perfect non-secure non-interactive reduction

Step 2: There is no perfect secure non-interactive reduction

What this lemma implies

Ordinary non-interactive reduction

Secure non-interactive reduction

Interactive reduction

References