Tradeoffs Matter: On Developing Lower Bounds

As I write this blog post, I just received news that the U.S. is designating Nigeria as a ‘country of particular concern’ over Christian persecutions. I cannot think of any other country with such a large population but (almost) equal representation of Christians and Muslims. Since before I was born, this has caused a lot of friction but somehow the country has survived all that friction (thus far!). But the friction stems from tradeoffs of having such religious heterogeneity, a topic of its own discussion. But in this post, I’ll focus on the high levels of mathematical tradeffs.

There’s something deeply human about lower bounds. They’re not just mathematical artifacts; they’re reflections of life itself. To me, a lower bound represents the minimum cost of achieving something meaningful. And in both life and research, there’s no escaping those costs.


The Philosophy: Tradeoffs Everywhere

Growing up, I was lucky enough to have certain people in my family spend endless hours guiding me: helping with schoolwork, teaching patience, pushing me toward growth. Looking back, I realize these people could have done a hundred other things with that time. But they (especially my mum) chose to invest it in me. That investment wasn’t free. It came with tradeoffs, the time she could never get back. But without that investment, I wouldn’t be who I am today.

That’s the thing about life: everything has a cost. In 2022/2023, I could have focused entirely on my research. But instead, I poured my energy into founding NaijaCoder, a technical education nonprofit for Nigerian students. It was rewarding, but also consuming. I missed out on months of uninterrupted research momentum. And yet, I have no regrets! Because that, too, was a lower bound. The minimum “cost” of building something (hopefully) lasting and impactful.

Every meaningful pursuit (i.e., love, growth, service, research) demands something in return. There are always tradeoffs. Anyone who claims otherwise is ignoring the basic laws of nature. You can’t have everything, and that’s okay. The beauty lies in understanding what must be given up for what truly matters.


Lower Bounds in Technical Domains

Mathematicians talk about lower bounds as the limits of efficiency. When we prove a lower bound, we’re saying: “No matter how clever you are, you can’t go below this.” It’s not a statement of despair: it’s a statement of truth.

Lower bounds define the terrain of possibility. They tell us what’s fundamentally required to solve a problem, whether it’s time, space, or communication. In a strange way, they remind me of the constraints in life. You can’t do everything at once. There’s a cost to progress. To prove a good lower bound is to acknowledge that reality has structure. There’s an underlying balance between effort and outcome. It’s an act of intellectual honesty: a refusal to pretend that perfection is free.

Nowhere do these tradeoffs feel more personal than in privacy. In the quest to protect individual data, we face the same universal truth: privacy has a price. If we want stronger privacy guarantees, we must give up some accuracy, some utility, some convenience. In differential privacy, lower bounds quantify that tension. They tell us that no matter how sophisticated our algorithms are, we can’t perfectly protect everyone’s data and keep every detail of the dataset intact. We must choose what to value more — precision of statistical estimates or protection.

These aren’t technical inconveniences; they’re moral lessons. They remind us that every act of preservation requires some loss. Just as my mother’s care required time, or NaijaCoder required research sacrifices, protecting privacy requires accepting imperfection.

Acceptance

The pursuit of lower bounds (in research or in life) is about humility. It’s about recognizing that limits aren’t barriers to doing good work; they’re the context in which doing good becomes possible.

Understanding lower bounds helps us stop chasing the illusion of “free perfection.” It helps us embrace the world as it is: a world where tradeoffs are natural, where effort matters, and where meaning is found not in escaping limits but in working within them gracefully.

So, whether in mathematics, privacy, or life, the lesson is the same: there are always tradeoffs. And that’s not a tragedy; it’s the very structure that gives our choices value.

I hope these ideas shape how I live, teach, and do research going forward. In my work on privacy, I’m constantly reminded that (almost) every theorem comes with a cost and that understanding those costs makes systems more honest and human. In education, through NaijaCoder, I see the same principle: every bit of growth for a student comes from someone’s investment of time and care.

Developing lower bounds isn’t just a mathematical pursuit. It’s a philosophy of life, one that teaches patience, realism, and gratitude. The world is full of limits, but within those limits, we can still create beauty, meaning, and progress: one bounded step at a time.

When Influence Scores Betray Us: Efficiently Attacking Memorization Scores

tl;dr 👉 We just put out work on attacking influence-based estimators in data markets. The student lead (who did most of the work) is Tue Do! Check it out. Accurate models are not enough. If the auditing tools we rely on can be fooled, then the trustworthiness of machine learning is on shaky ground.

Modern machine learning models are no longer evaluated solely by their training or test accuracy. Increasingly, we ask:

  • Which training examples influenced a particular prediction?
  • How much does the model rely on each data point?
  • Which data are most valuable, or most dangerous, to keep?

Answering these questions requires influence measures, which are mathematical tools that assign each training example a score reflecting its importance or memorization within the model. These scores are already woven into practice: they guide data valuation (identifying key examples), dataset curation (removing mislabeled or harmful points), privacy auditing (tracking sensitive examples), and even data markets (pricing user contributions).

But here lies the problem: what if these influence measures themselves can be attacked? In our new paper, Efficiently Attacking Memorization Scores, we show that they can. Worse, the attacks are not only possible but efficient, targeted, and subtle.


Memorization Scores: A Primer

A memorization score quantifies the extent to which a training example is “remembered” by a model. Intuitively:

  • A point has a high memorization score if the model depends heavily on it (e.g., removing it would harm performance on similar examples).
  • A low score indicates the model has little reliance on the point.

Formally, scores are often estimated through:

  • Leave-one-out retraining (how accuracy changes when a point is removed).
  • Influence functions (approximating parameter sensitivity).
  • Gradient similarity measures (alignment between gradients of a point and test loss).

Because they are computationally heavy, practical implementations rely on approximations, which (one could argue) introduces new fragilities.


The Adversarial Setting

We consider an adversary whose goal is to perturb training data so as to shift memorization scores in their favor. Examples include:

  • Data market gaming (prime motivation): A seller inflates the memorization score of their data to earn higher compensation.
  • Audit evasion: A harmful or mislabeled point is disguised by lowering its score.
  • Curation disruption: An attacker perturbs examples so that automated cleaning pipelines misidentify them as low-influence.

Constraints:

The attack could satisfy a few key conditions but we focus on

  1. Efficiency: The method must scale to modern, large-scale datasets.
  2. Plausibility: Model accuracy should remain intact, so the manipulation is not caught by standard validation checks.

The Pseudoinverse Attack

Our core contribution is a general, efficient method called the Pseudoinverse Attack: (1) Memorization scores, though nonlinear in general, can be locally approximated as a linear function of input perturbations. This mirrors how influence functions linearize parameter changes. (2) We solve an inverse problem (specified in paper), compute approximate gradients that link input perturbations to score changes, use the pseudo-inverse to find efficient perturbations and apply them selectively to target points. This avoids full retraining for each perturbation and yields perturbations that are both targeted and efficient.


Validation

We validate across image classification tasks (e.g., CIFAR benchmarks) with standard architectures (CNNs, ResNets).

Key Findings

  1. High success rate: Target scores can be reliably increased or decreased.
  2. Stable accuracy: Overall classification performance remains essentially unchanged.
  3. Scalability: The attack works even when applied to multiple examples at once.

Example (Score Inflation): A low-memorization image (e.g., a benign CIFAR airplane) is perturbed. After retraining, its memorization score jumps into the top decile, without degrading accuracy on other examples. This demonstrates a direct subversion of data valuation pipelines.


Why This Is Dangerous

The consequences ripple outward:

  • Data markets: Compensation schemes based on memorization become easily exploitable.
  • Dataset curation: Automated cleaning fails if adversaries suppress scores of mislabeled or harmful points.
  • Auditing & responsibility: Legal or ethical frameworks built on data attribution collapse under adversarial pressure.
  • Fairness & privacy: Influence-based fairness assessments are no longer trustworthy.

If influence estimators can be manipulated, the entire valuation-based ecosystem is at risk.

Conclusion

This work sits at the intersection of adversarial ML and interpretability:

  • First wave: Adversarial examples. i.e., perturb inputs to fool predictions.
  • Second wave: Data poisoning and backdoor attacks. i.e., perturb training sets to corrupt models.
  • Third wave (our focus): Attacks on the auditing layer: perturb training sets to corrupt pricing/interpretability signals without harming predictions/accuracy.

This third wave is subtle but potentially more damaging: if we cannot trust influence measures, then even “good” models become opaque and unaccountable. As machine learning moves toward explainability and responsible deployment, securing the interpretability layer is just as critical as securing models themselves.

Our paper reveals a new adversarial frontier: efficiently manipulating memorization scores.

  • We introduce the Pseudoinverse Attack, an efficient, targeted method for perturbing training points to distort influence measures.
  • We show, supported by theory and experiments, that memorization scores are highly vulnerable, even under small, imperceptible perturbations.
  • We argue that this undermines trust in data valuation, fairness, auditing, and accountability pipelines.

The Talking Drum as a Communication Channel

We just wrapped up Week 1 of my UIUC course, ECE598DA: Topics in Information-Theoretic Cryptography. The class introduces students to how tools from information theory can be used to design and analyze both privacy applications and foundational cryptographic protocols. Like many courses in privacy and security, we began with the classic one-time pad as our entry point into the fascinating world of secure communication.

We also explored another ‘tool’ for communication: the talking drum. This musical tradition offers a striking example of how information can be encoded, transmitted, and understood only by those familiar with the underlying code. In class, I played a video of a master drummer to bring this idea to life.

What Are Talking Drums?

Talking drums, especially those like the Yoruba dùndún, are traditional African hourglass‑shaped percussion instruments prized for their ability to mimic speech. Skilled drummers can vary pitch and rhythm to convey tonal patterns, effectively transmitting messages over short distances.

  • Speech surrogacy: The drum replicates the microstructure of tonal languages by adjusting pitch and rhythm, embodying what researchers call a “speech surrogate” .
  • Cultural ingenuity: Historically, these drums served as everyday communication tools, not merely for music or rituals but for sharing proverbs, announcements, secure messages, and more.

Here’s one of the exercises I gave students in Week 1:

Exercise: Talking drums. Chapter 1 of Gleick’s The Information highlights the talking drum as an early information technology: a medium that compresses, encodes, and transmits messages across distance. Through a communications theory lens, can you describe the talking drum as a medium that achieves a form of secure communication?

And here’s a possible solution:

African talking drums (e.g., Yoruba “dùndún”) reproduce the pitch contours and tonal patterns of speech. Since many West African languages are tonal, the drum reproduces structure without literal words.

  • Encoding: A spoken sentence is mapped into rhythmic and tonal patterns.
  • Compression: The drum strips away vowels and consonants, leaving tonal “skeletons.”
  • Security implication: To an outsider unfamiliar with the tonal code or local idioms, the message is incomprehensible. In effect, the drum acts as an encryption device where the key is cultural and linguistic context.

There are a few entities to model:

  • Source: Message in natural language (tonal West African language, e.g., Yoruba).
  • Encoder: Drummer maps source to a drummed signal using tonal contours and rhythmic patterns.
  • Channel: Physical propagation of drum beats across distance, subject to noise (wind, echo, competing sounds).
  • Legitimate receiver: Villager fluent in both the spoken language and cultural conventions.
  • Adversary: Outsider (colonial administrator, rival tribe, foreign merchant) who hears the same signal but lacks full knowledge of mapping or redundancy rules.

Let X denote a message in a tonal language (e.g., Yoruba). A drummer acts as an encoder E mapping X to a drummed signal S = E(X,K), where K denotes shared cultural/linguistic knowledge (idioms, proverbs, discourse templates) known to legitimate receivers but not to outsiders. The signal S traverses a physical channel C and is received as Y_R by insiders and as Y_A by an adversary (outsider). Decoders D_R and D_A attempt to reconstruct X: