A Tutorial on Secure Data Markets

tl;dr We’re hosting a tutorial at SIGMOD 2025 in Berlin on Privacy and Security in Distributed Data Markets. Come join us!


The Origins: From ChatGPT to Data Markets

It’s 2022, and ChatGPT has just exploded onto the scene. A few weeks later, I found myself asking a simple question: What is the value of the data used to train these powerful language models?

Obviously, I’m not the first person to ask that question but it led me to explore a few topics related to the question. At the time, Eugene Wu pointed me toward a growing thread in the database community: data markets, which are structured platforms where data can be bought, sold, and exchanged just like goods or services. I soon discovered that data markets are a key research direction for Raul Castro Fernandez and others, especially as data becomes one of the most valuable digital commodities.

What good is a data market if the very act of exchanging data opens both sellers and buyers up to liability? If transactions in the market result in devastating privacy breaches, who’s going to participate? The stakes are not hypothetical. Consider:

  • In June 2025, the UK fined DNA testing firm 23andMe £18 million ($23 million) for failing to prevent a massive 2023 data breach that exposed the genetic information of millions. (The Guardian)
  • Recently, Texas forced Google to pay $1.38 billion in a major privacy case—showing that Big Tech is not above the law when it comes to misusing user data. (Times of India)

Privacy and security are not just afterthoughts. They are prerequisites for data markets to thrive.


The Research Front: From Theory to Practice

The good news is that the academic community is rising to this challenge. At Columbia University, Zach Huang’s thesis work has focused on designing private data markets, aiming to ensure data utility while enforcing strong privacy guarantees. Meanwhile, Jiaxiang Liu has been exploring causal search systems and thinking hard about how they might be extended to enforce privacy and auditability by design.

These lines of work highlight a crucial pivot in how we should think about data ecosystems: not only what is exchanged, but how it is protected, tracked, and regulated.


Come Learn With Us: SIGMOD 2025 Tutorial

That’s why we’re organizing a tutorial at SIGMOD 2025 in Berlin, focusing on the intersection of privacy, security, and data markets. We believe that privacy-preserving and secure systems must be core to the infrastructure of data marketplaces (and not just bolted on afterward!).

The tutorial is split into five parts:

Part I: Survey on Data Markets

An overview of recent work on data valuation, pricing mechanisms, data provenance, and marketplace design. We’ll survey both academic systems and real-world platforms.

Part II: Privacy and Security Risks

We’ll walk through concrete case studies of real-world privacy failures in data markets (including, perhaps, those mentioned above), and examine the technical and legal gaps that led to them.

Part III: Privacy-Preserving Technologies and Security Tools

We will explore cutting-edge tools that can enforce privacy and security in data transactions:

  • Differential privacy
  • Secure multiparty computation
  • Federated learning

Part IV: Regulatory Considerations

A guided discussion on the legal frameworks that govern data usage (e.g., GDPR, Title 13), recent enforcement actions, and what upcoming regulations might mean for data markets.

Part V: Open Problems & Future Directions

We’ll close with an interactive session on open research questions:

  • How do we balance utility and privacy in high-value data?
  • Can we build incentive-compatible marketplaces that respect user rights?
  • What new threats will arise in markets built on AI-generated data?

Why This Matters

As AI systems become more powerful, the value (and vulnerability) of data becomes ever more pronounced. If we want a future where data can be shared, monetized, and reused responsibly, we must take privacy and security seriously.

Whether you’re a researcher, practitioner, policymaker, or simply curious about the future of data, we hope you’ll join us for this tutorial at SIGMOD 2025.

👉 See you in Berlin!