Auditing Differential Privacy via Donsker-Varadhan Representations

tl;dr One of my graduating students (Benjamin D. Kim) will be presenting a chapter of his M.S. thesis he completed at the University of Illinois at Urbana-Champaign (UIUC)! This work, on DP Rényi auditing, will appear at the TPDP 2026 workshop (and again at an ICML workshop this summer). Ben begins his Ph.D. at MIT EECS in the fall. Wishing him the best of luck as he begins a new journey. Here is the arxiv posting of the paper.

Differential privacy has become one of the dominant frameworks for protecting sensitive information in machine learning. In its ideal form, a differentially private algorithm guarantees that the presence or absence of any single person’s data has only a limited effect on the distribution of outputs. This is a powerful promise: even if an adversary sees the trained model, the released statistics, or some downstream prediction, they should not be able to infer too much about any one individual in the training set.

But as differentially private machine learning systems move from theory into practice, a natural question arises:

How do we know that a system claiming differential privacy is actually private?

That question is the starting point of our paper. The paper develops a black-box auditing framework for machine learning algorithms that claim Rényi differential privacy (RDP). The key idea is to treat privacy auditing as a statistical estimation problem: run the mechanism on neighboring datasets, observe its outputs, and estimate the Rényi divergence between the resulting output distributions. The paper’s main technical contribution is to do this using the Donsker–Varadhan variational representation of Rényi divergence, implemented with neural estimators inspired by MINE, and to prove finite-sample confidence guarantees and matching minimax lower bounds.

The paper contributes to the literature (e.g., see [3], [4], [5]) that moves privacy auditing closer to the role that cryptanalysis plays in cryptography: a necessary discipline for stress-testing, validating, and understanding the real security of deployed systems.


Why differential privacy needs auditing

A differential privacy guarantee is a theorem/lemma about a (randomized) mechanism. If an algorithm is correctly implemented, if the analysis is tight, if the accounting is correct, if the randomness is generated properly, and if all modeling assumptions are satisfied, then the promised privacy parameter should hold.

That is a lot of “ifs.”

In modern private machine learning, the most common algorithmic workhorse is DP-SGD: differentially private stochastic gradient descent. DP-SGD clips per-example gradients, adds Gaussian noise, and composes privacy loss over many training steps. In practice, privacy accounting is often expressed using Rényi differential privacy because RDP composes cleanly and is central to modern privacy accountants. The paper emphasizes that DP-SGD and related private learning systems are routinely deployed with RDP guarantees because of these tight composition properties.

However, the existence of a theoretical privacy analysis does not eliminate the need for empirical validation. Auditing matters for at least four reasons.

First, implementations can be wrong. Gradient clipping may be misapplied, random seeds may be mishandled, batching may differ from the analyzed model, or the privacy accountant may be used incorrectly. A privacy theorem protects the algorithm as specified, not necessarily the code that is actually deployed.

Second, privacy analyses can be loose. An algorithm may satisfy a formal upper bound, but the true privacy leakage may be much smaller or, in some cases, larger than expected under a flawed analysis. Auditing gives empirical lower bounds on the true leakage and helps assess whether the accounting is informative.

Third, privacy claims need external validation. In deployed systems, users, regulators, and scientific reviewers may not be satisfied with a claimed value of \varepsilon. They may want evidence that the implementation behaves as advertised.

Fourth, auditing can reveal the operational meaning of privacy parameters. Even when a theoretical guarantee is correct, it may be difficult to interpret. Empirical attacks and audits help translate abstract divergence bounds into observable distinguishability.

This is why our paper frames privacy auditing as the counterpart to privacy accounting. Privacy accounting gives an upper bound: “the mechanism should leak at most this much.” Privacy auditing gives a lower bound: “we can empirically demonstrate at least this much leakage.” A good auditing method narrows the gap between these two quantities.


DP auditing as cryptanalysis for private machine learning

A useful analogy is with cryptography.

In cryptography, a proposed encryption scheme, signature scheme, or zero-knowledge protocol is not considered trustworthy merely because its designers believe it is secure. The community tries to break it. Cryptanalysts look for distinguishing attacks, key-recovery attacks, side-channel attacks, malleability attacks, and implementation vulnerabilities. A failed attack does not prove security, but strong cryptanalysis increases confidence; a successful attack exposes a gap between the claimed security and the actual behavior.

DP auditing plays a similar role for privacy-preserving machine learning.

A claimed DP mechanism is like a cryptographic construction. Its privacy proof is like a security reduction or theorem. An audit is like an attack: the auditor tries to distinguish whether a specific individual’s data was included in training. If the auditor can distinguish the “in” world from the “out” world too well, then the mechanism leaks more information than expected.

The analogy is especially clear in membership-inference audits. The attacker constructs two neighboring datasets: one containing a special record, often called a canary, and one without it. The mechanism is run many times on both datasets. The auditor then observes the outputs and tries to distinguish which dataset was used. If the output distributions are far apart, the canary has left a detectable trace.

But there is also an important difference between cryptanalysis and DP auditing. In cryptography, the usual goal is to find any efficient adversary that violates a security definition. In DP auditing, especially in this paper, the goal is more quantitative: estimate a divergence between distributions and attach a statistically valid confidence statement to that estimate. The auditor is not merely saying “I found an attack.” The auditor is saying something closer to:

With high confidence, the Rényi divergence between the mechanism’s outputs on these neighboring datasets is at least this value.

That makes the audit directly comparable to an RDP claim. This is one of the central conceptual advances of the paper. Rather than auditing DP indirectly through a particular attack heuristic, the paper audits the same mathematical object that appears in the privacy definition: the Rényi divergence between neighboring output distributions.


From DP to Rényi DP

Pure/approximate differential privacy says that a randomized mechanism M is (\varepsilon,\delta)-DP if, for all neighboring datasets D,D' and all measurable events S,

Pr[M(D)\in S]\leq e^\varepsilon Pr[M(D')\in S]+\delta.

Rényi differential privacy instead bounds the Rényi divergence between the output distributions:

D_\alpha(M(D)\Vert M(D')) \leq \varepsilon_\alpha,

for a Rényi order \alpha>1. In our paper, we recall this definition and notes that RDP can be converted back into approximate DP using the standard conversion: if a mechanism satisfies (\alpha,\varepsilon_\alpha)-RDP, then it also satisfies (\varepsilon_\alpha+\log(1/\delta)/(\alpha-1),\delta)-DP.

RDP is particularly useful for machine learning because privacy loss accumulates over many iterations of training. DP-SGD may run for hundreds or thousands of steps. RDP gives a convenient and often tight way to account for this composition. That is why modern DP-SGD analyses often report privacy through an RDP accountant, even if the final result is converted into (\varepsilon,\delta)-DP. But this creates a mismatch in the auditing literature. Many prior audits were designed around pure or approximate DP, membership inference, data poisoning, or hypothesis testing formulations. These are valuable, but they do not directly estimate the RDP quantity that modern privacy accountants actually track. The paper argues that this is a gap: if deployed systems claim RDP guarantees, then auditors should be able to audit RDP directly.


The black-box auditing setting

The paper focuses on black-box auditing. In this setting, the auditor does not inspect the internal gradients, random noise, or intermediate training trajectory. Instead, the auditor can choose inputs, run the training mechanism, and observe outputs or post-processed outputs.

For DP-SGD, the black-box audit proceeds roughly as follows: The auditor chooses a canary example (x',y'). The mechanism is trained repeatedly on a dataset D without the canary and on the neighboring dataset D\cup{(x',y')} with the canary. After training, the auditor measures the loss of the trained model on the canary. This produces two empirical distributions of losses: one from canary-absent training and one from canary-present training. The audit then estimates the Rényi divergence between these two loss distributions. This is a natural black-box attack because if the model behaves differently on the canary depending on whether it was included in training, then the canary has influenced the learned model. In privacy terms, the output distributions M(D) and M(D') are distinguishable.

The paper also uses a technique from prior work: worst-case initialization [3]. DP-SGD’s privacy guarantee is unaffected by the choice of initial parameters before private training, so the auditor can choose initial parameters that make training more sensitive to the canary. The paper follows the method of crafting such initializations by pretraining on a separate part of the dataset. This increases the statistical power of the audit while remaining within a black-box threat model. Again, this is analogous to cryptanalysis: a cryptanalyst often chooses adversarial plaintexts, messages, or protocol inputs to expose weaknesses. Here, the privacy auditor chooses a canary and initialization that make the privacy loss easier to detect.


Why estimating Rényi divergence is hard

The central statistical problem is this:

Given samples from two unknown distributions P and Q, estimate or lower bound D_\alpha(P\Vert Q).

This is challenging because the distributions may be high-dimensional, implicit, and accessible only through samples. In machine learning, the output distribution of a randomized training algorithm may be a distribution over model parameters, predictions, losses, or other post-processed statistics. The density ratio P(x)/Q(x) is not available. Direct plug-in estimation is usually impossible.

This is where variational representations become useful.

A variational representation rewrites a divergence as a supremum over functions. Instead of needing the density ratio explicitly, one searches over a class of critic functions (denoted by T in our work) that try to distinguish samples from P and Q. If the critic class is rich enough, optimizing the variational objective recovers the divergence. If the critic class is restricted, the objective gives a lower bound. This leads to the philosophy behind neural divergence estimation: train a neural network critic to maximize a divergence objective.


The Donsker-Varadhan Representation

The Donsker-Varadhan variational formula [1][2] is a result that expresses certain information-theoretic quantities, most famously KL divergence, as a supremum over test functions. Our paper uses a Rényi-divergence analogue of this variational perspective.

The key insight is that the unknown density ratio is replaced by an optimization problem over functions. In practice, the auditor restricts to a neural network class. The resulting class-restricted objective is a lower bound on the full variational divergence, because the supremum is taken over a restricted class.

For auditing, this lower-bound property is a feature, not a bug. A privacy audit should be conservative. If the neural critic certifies a large lower bound on Rényi divergence, then the true divergence is at least that large, subject to statistical confidence corrections. Optimization failure may make the bound smaller, but it does not create a false privacy violation if the statistical certificate is valid.


MINE and neural estimation of information leakage

MINE (Mutual Information Neural Estimation) [6], popularized the use of neural networks to estimate information-theoretic quantities through variational objectives. In MINE, a neural critic is trained to distinguish samples from a joint distribution from samples from the product of marginals, thereby estimating mutual information.

Our paper relies on this neural-estimation technique for privacy auditing. Instead of estimating mutual information, the auditor estimates a variational Rényi divergence between two loss distributions: the distribution obtained when the canary is absent and the distribution obtained when the canary is present.

There are two important implementation details.

First, the objective involves exponentials of the critic output. This can create high variance, especially for larger Rényi orders \alpha. Second, the expectations in the objective must be approximated using minibatches. Naive minibatch estimates can be unstable because exponential moments are sensitive to outliers. To address this, the paper follows the MINE approach of using minibatching together with an exponential moving average, or EMA. EMA stabilizes the stochastic gradients by smoothing estimates of the exponential terms across minibatches. The paper explicitly notes that this is especially helpful for larger Rényi orders, although variants of this estimator can still have high mean-squared error at larger \alpha. For this reason, the experiments focus on \alpha\in(1,2], where the estimator is more reliable.


How this improves on previous DP auditing work

Earlier DP auditing work used data poisoning attacks to audit DP-SGD and showed that empirical lower bounds on privacy parameters could exceed naive theoretical analysis. Later work improved the tightness of audits, reduced the number of required training runs, or adapted to different threat models. Some white-box audits exploit knowledge of the DP mechanism and access to intermediate information, while more recent work has explored one-run auditing through connections between DP and generalization. (Our paper’s improvement is not simply that it gets better empirical numbers, although it often does. The deeper improvement is that it changes the object being audited.)

Prior methods often focus on pure or approximate DP, membership inference, data poisoning, or attack-specific distinguishers. These are powerful but indirect for systems whose guarantees are expressed through RDP. This paper directly audits Rényi divergence, the quantity appearing in the RDP definition.

Other works (based on DP violation detection and DP-Finder methods), search for counterexamples or privacy violations. These tools are useful for debugging and falsifying incorrect implementations. But the paper argues that they are not designed to provide tight, sample-valid confidence bounds for correct algorithms, nor do they address RDP auditing with optimality guarantees.

The paper’s contribution can therefore be summarized in three improvements:

  1. Directness: it audits RDP by estimating Rényi divergence, rather than auditing another privacy notion and converting indirectly.
  2. Statistical validity: it gives explicit non-asymptotic confidence intervals, separating estimation error from true algorithmic privacy leakage.
  3. Optimality: it proves minimax lower bounds showing that the sample-complexity guarantees are essentially optimal up to logarithmic factors.

This combination is what makes the result more than another empirical attack. We establish a statistical theory of black-box RDP auditing.


Experimental findings

The empirical section evaluates the auditing method on DP-SGD for image classification tasks, including MNIST and CIFAR-10. The auditor collects 500 loss observations for each canary-in and canary-out condition, then estimates Rényi divergence using the DV-Rényi model. The experiments compare the resulting black-box RDP audits against a prior state-of-the-art black-box auditing method, with conversions among \mu-GDP, (\varepsilon,\delta)-DP, and RDP where needed. The results show strong gains at small and moderate Rényi orders, especially \alpha=1.25 and \alpha=2. For example, at \alpha=1.25, the DV-Rényi auditor improves over the prior black-box baseline across the reported MNIST and CIFAR-10 privacy regimes. At \alpha=2, the method also improves in many low and moderate privacy regimes, though the results show that performance can degrade at larger target privacy levels, consistent with the known instability of exponential-moment estimators at larger orders or larger divergences.

Our discussion emphasizes that these improvements come from directly auditing Rényi divergence rather than relying on indirect privacy conversions or attack-specific heuristics. The use of worst-case initialization increases the canary’s influence on training dynamics while preserving the validity of the DP-SGD privacy guarantee. The empirical story is therefore aligned with the theory: the auditor is powerful because it estimates the right quantity. (And it is principled because the estimate comes with finite-sample guarantees.)


Open questions

Our paper closes by mentioning several future directions, including extending optimal Rényi auditing guarantees to interactive and distributed settings, and exploring alternative variational formulas with lower variance for larger \alpha>1.

Ben has already started thinking about some of these directions, especially this one:

Can we apply our techniques to audit distributed and federated private learning?

Our focuses on black-box auditing of DP-SGD in a centralized setting. But many privacy-sensitive systems are distributed: federated learning, secure aggregation, data markets, collaborative analytics, and multi-party training. In such systems, privacy leakage may arise not only from the final model but also from messages, participation patterns, aggregation protocols, or side information. Extending optimal RDP auditing to distributed and interactive mechanisms would significantly broaden the applicability of the framework.

References

[1] Monroe Donsker and S. R. Srinivasa Varadhan. Asymptotic evaluation of certain markov
process expectations for large time, IV.
Communications on Pure and Applied Mathematics,
36(2):183–212, 1983.

[2] Venkat Anantharam. A variational characterization of rényi divergences. IEEE Transactions on
Information Theory, 64(11):6979–6989, 2018.

[3] Meenatchi Sundaram Muthu Selva Annamalai and Emiliano De Cristofaro. Nearly tight
black-box auditing of differentially private machine learning.
Advances in Neural Information
Processing Systems, 37:131482–131502, 2024.

[4] Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski,
Nicholas Carlini, and Andreas Terzis. Tight auditing of differentially private machine learning.
In 32nd USENIX Security Symposium (USENIX Security 23), pages 1631–1648, 2023.

[5] Thomas Steinke, Milad Nasr, and Matthew Jagielski. Privacy auditing with one (1) training run.
In Advances in Neural Information Processing Systems (NeurIPS), volume 36, 2023.

[6] Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio,
Aaron Courville, and Devon Hjelm. Mutual information neural estimation. In Proceedings of
the 35th International Conference on Machine Learning (ICML). PMLR, 2018

The Existence of Error-Correcting Codes Implies Privacy Lower Bounds

Excited to share this recent IEEE BITS article in the Special Issue on Error-Correcting Codes [1].

Abstract

We discuss how the lens of error-correcting codes yields lower bounds on the privacy-utility tradeoff in differential privacy (DP). Reconstruction attacks, packing and covering arguments, and fingerprinting codes can all be interpreted as coding-theoretic tools: they allow for analysis of families of datasets whose query-answer patterns form (possibly randomized) codewords with large pairwise distance. Any DP mechanism answering these queries too accurately reveals enough structure to enable “decoding”; that is, identifying a user or reconstructing large parts of the dataset. By presenting each lower-bound technique through an explicit coding viewpoint, this survey unifies classical results on counting queries with recent advances in statistical estimation and high-dimensional learning, including Gaussian covariance estimation. We conclude with open problems at the intersection of coding theory and differential privacy.

[1] https://ieeexplore.ieee.org/document/11481645

Secure Non-Interactive Reductions

A recurring theme in cryptography is that correlation is a resource. Alice and Bob may start with a source correlation CC, say noisy copies of a bit or erasures. The hope is to convert it into a more useful target correlation DD. What changes dramatically is whether they are allowed to talk, and whether we only care about correctness or also about security. The recent SNIR/SNIS line of work makes that distinction precise and surprisingly rigid.

This post explains that landscape in a way I hope is useful. The main message/tl;dr is:

  • ordinary non-secure non-interactive reductions ask only whether local maps can reproduce the right joint law;
  • secure non-interactive reductions ask for that plus privacy against each party;
  • interactive reductions are much more powerful because communication can reshape the parties’ views before the final local-output step.

Recently, I have also been working with colleagues in ECE and Physics on quantum secure non-interactive reductions, which aim to understand when one quantum correlation resource can be transformed into another quantum (or classical) resource without communication and while preserving the appropriate notion of security against each party. In the quantum setting, the picture is even more delicate: the relevant resources may be shared quantum states, measurements may disturb systems, side information may be entangled, and the right privacy conditions must account for quantum adversaries and quantum views rather than just classical random variables. Our framework for thinking about that problem relies heavily on the formalization of the classical or non-quantum version, which is the subject of this post. Put differently, before asking what secure non-interactive reducibility should mean for shared entangled states or quantum correlations, it is essential to first understand the classical theory: what ordinary non-interactive reductions allow, what secure non-interactive reductions forbid, and why interaction fundamentally changes the reducibility landscape. I will also prove a simple lemma showing that secure non-interactive reducibility is strictly stronger than ordinary non-interactive reducibility.

The three models

Fix a source correlation C=(X,Y)C=(X,Y) and a target correlation D=(U,V)D=(U,V).

Non-secure non-interactive reduction (NIR / NIS)

Alice gets XnX^n, Bob gets YnY^n, they do no communication, and output U=f(Xn),V=g(Yn)U’ = f(X^n), V’ = g(Y^n), or randomized versions using local coins. The goal is for (U,V)(U, V) to be close in statistical distance (or total variation distance) to (U,V)(U’, V’).

That is the classical non-interactive simulation perspective: only correctness of the output law matters. The SNIS paper explicitly contrasts this with SNIS, noting that NIS “only considers correctness (not security).”

Secure non-interactive reduction (SNIR / SNIS)

Now Alice and Bob still do not communicate, but in addition to correctness, the reduction must hide whatever the target correlation is supposed to hide. In the formulation used in the SNIS paper, a reduction from (U,V)(U,V) to (X,Y)n(X,Y)^{\otimes n} must satisfy:

  1. Correctness: (U,V)(U’,V’) is close (in statistical distance) to (U,V)(U,V).
  2. Security against corrupt Alice: conditioned on the final outputs (u,v)(u,v), Alice’s view should be close to depending only on uu, not on the extra value vv.
  3. Security against corrupt Bob: symmetrically, Bob’s view should be close to depending only on vv, not on the extra value uu.

This is the key strengthening. In NIR, a party may silently carry “too much knowledge” about the other party’s output, as long as the final joint distribution looks right. In SNIR, that is disallowed.

Interactive secure reductions

Here Alice and Bob may exchange messages before producing outputs. An interactive secure reduction can be seen as having two phases. First, an interaction phase transforms the parties’ views. Second, a local derivation phase maps those final views to outputs, and the security condition applies to this derivation step.

That decomposition is conceptually important: SNIR isolates the “final local secure derivation” part while forbidding the interaction that might create the right intermediate views in the first place. This is why interactive reductions can be much stronger.

Why SNIR is the right cryptographic strengthening

NIR asks “can I simulate the right distribution?”
SNIR asks “can I simulate the right distribution without the wrong leakage?”

The SNIS paper states this very plainly: in NIS, “erasing information from parties’ views… is permissible,” but that may not be cryptographically secure. SNIS was introduced precisely to capture non-interactive secure conversion of one correlation into another. This matters because many correlations that are “equivalent enough” from a purely distributional viewpoint stop being interchangeable once one insists on simulation-based privacy.

SNIR is strictly stronger than non-secure NIR

Here is a simple lemma/example that captures the basic separation.

Lemma/example

There exist correlations CC and DD such that:

  1. DD has a perfect non-secure non-interactive reduction from CC, but
  2. DD has no perfect secure non-interactive reduction from CC.

Proof

Let S,TS,T be independent uniform bits.

Define the source correlationC=((S,T),T).C = ((S,T),\, T).

So Alice receives the pair (S,T)(S,T), while Bob receives only TT.

Define the target correlationD=(S,T).D = (S,T).

Step 1: There is a perfect non-secure non-interactive reduction

Alice outputsU:=S,U’ := S,

and Bob outputsV:=T.V’ := T.

No communication is used. Since S,TS,T are independent uniform bits, the joint distribution of (U,V)(U’,V’) is exactly (S,T)(S,T), which is exactly the target correlation DD.

So DD has a perfect NIR from CC.

Step 2: There is no perfect secure non-interactive reduction

Assume for contradiction that the above source-to-target conversion were a perfect SNIR.

In the target correlation D=(S,T)D=(S,T), Alice’s output is U=SU=S and Bob’s output is V=TV=T. Since SS and TT are independent, a correct secure realization of DD should not let Alice’s view reveal Bob’s output TT beyond what is implied by her own output SS.

But in the source CC, Alice’s view isX=(S,T).X=(S,T).

Conditioned on the target outputs (U,V)=(s,t)(U,V)=(s,t), Alice’s view is deterministicallyX=(s,t).X=(s,t).

Hence the conditional law of Alice’s view depends on tt . i.e., on Bob’s output VV, not merely on Alice’s own output U=sU=s.

This violates the security-against-Alice requirement for SNIR.

Therefore no perfect SNIR from CC to DD exists.

What this lemma implies

This toy example is simple, but it isolates the exact distinction.

  • In NIR, it is fine that Alice initially knows both SS and TT; all we asked for was the right final law.
  • In SNIR, that extra knowledge is fatal, because Alice learns Bob’s target output in a way the target correlation itself does not permit.

Ordinary non-interactive reduction

“Can I locally compress, discard, relabel, or threshold my view so that the joint output law matches the target?”

This is fundamentally an information-theoretic simulation question.

Secure non-interactive reduction

“Can I do that without retaining forbidden side information about the other party’s target output?”

This is a simulation-based-privacy question.

Interactive reduction

“Can I first reengineer the two views using communication, and only then derive the target securely?”

This is why interaction is stronger.


References

[1] Agarwal, Narayanan, Pathak, Prabhakaran, Prabhakaran, Rehan, “Secure Non-Interactive Reduction and Spectral Analysis of Correlations” (Eurocrypt 2022 / ePrint 2022/262), especially the spectral criterion, mirroring perspective, and incompleteness results.

[2] Bhushan, Misra, Narayanan, Prabhakaran, “Secure Non-Interactive Reducibility is Decidable” (TCC 2022 / ePrint 2022/1457), for decidability of SNIR feasibility.

[3] Khorasgani, Maji, Nguyen, “Secure Non-interactive Simulation: Feasibility and Rate” (Eurocrypt 2022), for the simulation-based definition, comparison with NIS and OWSC, and exact rate/feasibility characterizations for BSS/BES families.