A Tutorial on Secure Data Markets

tl;dr We’re hosting a tutorial at SIGMOD 2025 in Berlin on Privacy and Security in Distributed Data Markets. Come join us!


The Origins: From ChatGPT to Data Markets

It’s 2022, and ChatGPT has just exploded onto the scene. A few weeks later, I found myself asking a simple question: What is the value of the data used to train these powerful language models?

Obviously, I’m not the first person to ask that question but it led me to explore a few topics related to the question. At the time, Eugene Wu pointed me toward a growing thread in the database community: data markets, which are structured platforms where data can be bought, sold, and exchanged just like goods or services. I soon discovered that data markets are a key research direction for Raul Castro Fernandez and others, especially as data becomes one of the most valuable digital commodities.

What good is a data market if the very act of exchanging data opens both sellers and buyers up to liability? If transactions in the market result in devastating privacy breaches, who’s going to participate? The stakes are not hypothetical. Consider:

  • In June 2025, the UK fined DNA testing firm 23andMe £18 million ($23 million) for failing to prevent a massive 2023 data breach that exposed the genetic information of millions. (The Guardian)
  • Recently, Texas forced Google to pay $1.38 billion in a major privacy case—showing that Big Tech is not above the law when it comes to misusing user data. (Times of India)

Privacy and security are not just afterthoughts. They are prerequisites for data markets to thrive.


The Research Front: From Theory to Practice

The good news is that the academic community is rising to this challenge. At Columbia University, Zach Huang’s thesis work has focused on designing private data markets, aiming to ensure data utility while enforcing strong privacy guarantees. Meanwhile, Jiaxiang Liu has been exploring causal search systems and thinking hard about how they might be extended to enforce privacy and auditability by design.

These lines of work highlight a crucial pivot in how we should think about data ecosystems: not only what is exchanged, but how it is protected, tracked, and regulated.


Come Learn With Us: SIGMOD 2025 Tutorial

That’s why we’re organizing a tutorial at SIGMOD 2025 in Berlin, focusing on the intersection of privacy, security, and data markets. We believe that privacy-preserving and secure systems must be core to the infrastructure of data marketplaces (and not just bolted on afterward!).

The tutorial is split into five parts:

Part I: Survey on Data Markets

An overview of recent work on data valuation, pricing mechanisms, data provenance, and marketplace design. We’ll survey both academic systems and real-world platforms.

Part II: Privacy and Security Risks

We’ll walk through concrete case studies of real-world privacy failures in data markets (including, perhaps, those mentioned above), and examine the technical and legal gaps that led to them.

Part III: Privacy-Preserving Technologies and Security Tools

We will explore cutting-edge tools that can enforce privacy and security in data transactions:

  • Differential privacy
  • Secure multiparty computation
  • Federated learning

Part IV: Regulatory Considerations

A guided discussion on the legal frameworks that govern data usage (e.g., GDPR, Title 13), recent enforcement actions, and what upcoming regulations might mean for data markets.

Part V: Open Problems & Future Directions

We’ll close with an interactive session on open research questions:

  • How do we balance utility and privacy in high-value data?
  • Can we build incentive-compatible marketplaces that respect user rights?
  • What new threats will arise in markets built on AI-generated data?

Why This Matters

As AI systems become more powerful, the value (and vulnerability) of data becomes ever more pronounced. If we want a future where data can be shared, monetized, and reused responsibly, we must take privacy and security seriously.

Whether you’re a researcher, practitioner, policymaker, or simply curious about the future of data, we hope you’ll join us for this tutorial at SIGMOD 2025.

👉 See you in Berlin!

Academia is Not Perfect But It Can Be Transformative


That’s my office at the University of Illinois, Urbana-Champaign. Come say hi!

Academia is not perfect. But for many of us, it remains the most accessible and reliable system we have for helping people, especially students from disadvantaged backgrounds, achieve economic, social, and intellectual mobility.

This truth came into sharp focus again as I listened to the powerful 2025 Harvard commencement address by a fellow immigrant, born in Ethiopia. You can watch it here. Like the speaker, I came to the United States as an “alien”—driven by hope, hard work, and a belief that education could change lives. And it has. I’m now a Harvard-trained professor, and I can honestly say that academia has transformed not only my life but the lives of my family and friends. Thank you academia!

But I didn’t always feel this way.

There have been moments when I was ready to walk away from it all, when the “ivory tower” felt more like a fortress of burnout, ego, and misplaced priorities than a place for learning and growth. Let me share one example. It’s not the only one, but it’s seared into my memory.

Fall 2017: My Worst Semester in Graduate School

It was my hardest semester. Many friends (mostly at Harvard and MIT) were struggling. Struggling with research, personal issues, mental health. The pressure was suffocating. Then in October, a devastating event happened. An amazing Harvard undergraduate student that I knew passed away. It shook the community. You can read about one account of that time in this Crimson article. But the article only scratches the surface. Behind closed doors, people were hurting.

The very next day, I remember writing to a colleague: “I’m done with academia.” I was angry, heartbroken, and disillusioned. How could we call this a place of learning when the well-being of students seemed like an afterthought?

At one point, I told a friend: “Even if I finish my PhD, I can’t become a professor. There’s too much blood on the hands of professors. I don’t want to be part of a system that prioritizes awards, grants, and prestige over the health and wellness of students.”

That was my truth then.

The System Is Flawed—But We’re Also Part of It

The reality is that academia is both a system and a community. And like all systems, it reflects the values of those who participate in it. Professors, students, administrators: we all carry some responsibility. Systemic change requires constant reform. It also requires courage.

I often wonder why so many brilliant people end up in industry (e.g., Google Brain), seemingly leaving academia behind. For some, it’s about opportunity. For others, it’s about survival.

But over time, my perspective has shifted. I’ve encountered professors who are deeply committed to their students’ growth and well-being. I’ve seen programs and projects that genuinely change lives. And I’ve come to realize that as long as I can stay true to my principles, and use my position to help others, I’m okay with failing by some external metric. In fact, I welcome it.

Because anyone who isn’t failing at something is probably not trying hard enough.

Reaching Out and Reaching Forward

If you’re struggling in school, you’re not alone. Maybe you’re in school to support your family. That’s valid. Maybe you’re there because you love to learn. That’s valid too. There’s no “wrong” reason to be in school, only your own journey to make sense of.

As a professor, my job isn’t just to teach. It’s to help you grow and to build a support network around you. There will be hard times. But we can face them together.

Over the past few years, I’ve found the deepest meaning in my work with NaijaCoder, an initiative aimed at empowering young people in Nigeria through technical education. Watching our alumni grow and excel has been one of the greatest joys of my life. It reminds me that education is not about titles or tenure. It’s about transformation.

A Hopeful Commitment

Yes, the system needs fixing. But I haven’t given up on academia. Not because it’s perfect, but because it’s possible to make it better. Because there are still people who care more about students than status. Because in every classroom, in every lab, in every student from Lagos to Urbana-Champaign, I see potential.

To every student reading this: Relax, and reach out. You don’t have to do it alone. And to every professor: Let’s do better. Our legacy is not in our publications, but in the people we uplift.


P.S. If you’re in a tough place right now, please know that it’s OK to ask for help. Failing isn’t the end. It’s often the beginning of something more honest, more human, and more lasting.

Privacy and Security in Data Markets

At SIGMOD 2025, my collaborators and I are scheduled to give a tutorial on Privacy and Security in Distributed Data Markets. The core material that will be presented is summarized in the accompanying paper.

Abstract

Data markets play a pivotal role in modern industries by facilitating the exchange of data for predictive modeling, targeted marketing, and research. However, as data becomes a valuable commodity, privacy and security concerns have grown, particularly regarding the personal information of individuals. This tutorial explores privacy and security issues when integrating different data sources in data market platforms. As motivation for the importance of enforcing privacy requirements, we discuss attacks on data markets focusing on membership inference and reconstruction attacks. We also discuss security vulnerabilities in decentralized data marketplaces, including adversarial manipulations by buyers or sellers. We provide an overview of privacy and security mechanisms designed to mitigate these risks. In order to enforce the least amount of trust for buyers and sellers, we focus on distributed protocols. Finally, we conclude with opportunities for future research on understanding and mitigating privacy and security concerns in distributed data markets.

Schedule

Part I: Survey on Data Markets

Part II: Privacy and Security Risks

Part III: Privacy-Preserving Technologies and Security Tools

Part IV: Regulatory Considerations

Part V: Open Problems & Future Work

Part VI: Q & A

Leading up to the conference, I’m planning to post on different aspects of the tutorial.