Privacy and Security in Data Markets

At SIGMOD 2025, my collaborators and I are scheduled to give a tutorial on Privacy and Security in Distributed Data Markets. The core material that will be presented is summarized in the accompanying paper.

Abstract

Data markets play a pivotal role in modern industries by facilitating the exchange of data for predictive modeling, targeted marketing, and research. However, as data becomes a valuable commodity, privacy and security concerns have grown, particularly regarding the personal information of individuals. This tutorial explores privacy and security issues when integrating different data sources in data market platforms. As motivation for the importance of enforcing privacy requirements, we discuss attacks on data markets focusing on membership inference and reconstruction attacks. We also discuss security vulnerabilities in decentralized data marketplaces, including adversarial manipulations by buyers or sellers. We provide an overview of privacy and security mechanisms designed to mitigate these risks. In order to enforce the least amount of trust for buyers and sellers, we focus on distributed protocols. Finally, we conclude with opportunities for future research on understanding and mitigating privacy and security concerns in distributed data markets.

Schedule

Part I: Survey on Data Markets

Part II: Privacy and Security Risks

Part III: Privacy-Preserving Technologies and Security Tools

Part IV: Regulatory Considerations

Part V: Open Problems & Future Work

Part VI: Q & A

Leading up to the conference, I’m planning to post on different aspects of the tutorial.

Fall 2025: Topics in Information-Theoretic Cryptography

Since the beginning of this year, I have been developing a course on “Topics in Information-Theoretic Cryptography”. Recently, the course was approved for Fall 2025. I’m very excited to share some research with undergraduate/graduate students! Below, I list some relevant information for the proposed course.

Course Number and Title

ECE598DA: Topics in Information-Theoretic Cryptography

Description

In this course, we will study foundational and recent work on the use of information theory to design and analyze cryptographic protocols. We will begin by studying privacy attacks which motivate strong privacy and security definitions. Then, we will explore the basics of differential privacy and study some core works on zero-knowledge proofs. Finally, we will explore various applications, including watermarking of generative models.

Recommended Textbooks

  • Introduction to Cryptography with Coding Theory. By Wade Trappe, Lawrence C. Washington.
  • Tutorials on the Foundations of Cryptography. Edited by Yehuda Lindell.

Syllabus

Week 1: Introduction: motivations, one-time pad review, review of probability theory

Week 2: Attacks and Composition Theorems for Differential Privacy

Week 3: Standard Mechanisms for Differential Privacy

Week 4: Information-Theoretic Lower Bounds for Differential Privacy

Week 5: Differentially Private Statistical Estimation and Testing

Week 6: Zero-Knowledge Proofs

Week 7: Statistical Zero-Knowledge Proofs: Part I

Week 8: Statistical Zero-Knowledge Proofs: Part II

Week 9: Multi-Party Computation

Week 10: Multi-Party and Computational Differential Privacy

Week 11: Code-Based Cryptography: Part I

Week 12: Code-Based Cryptography: Part II

Week 13: More Applications

  • Watermarking of Generative Models
  • Proof Systems for Machine Learning
  • Bounded-Storage Cryptography
  • Quantum Cryptography

Week 14: Project Presentations

Watermarking Language Models

Lav Varshney and I recently released a IACR preprint on how to analyze unforgeable watermarking procedures for generative agents. Our approach relies on cryptographic techniques and computational entropy notions.

Abstract

In this work, we construct distortion-free and unforgeable watermarks for language models and generative agents. The watermarked output cannot be forged by an adversary nor removed by the adversary without significantly degrading model output quality. That is, the watermarked output is distortion-free: the watermarking algorithm does not noticeably change the quality of the model output and without the public detection key, no efficient adversary can distinguish output that is watermarked from outputs which are not. The core of the watermarking schemes involve embedding a message and publicly-verifiable digital signature in the generated model output. The message and signature can be extracted during the detection phase and verified by any authorized entity that has a public key. We show that, assuming the standard cryptographic assumption of one-way functions, we can construct distortion-free and unforgeable watermark schemes. Our framework relies on analyzing the inaccessible entropy of the watermarking schemes based on computational entropy notions derived from the existence of one-way functions.