How can you understand a dataset too big to store -- or even read -- completely? Sublinear algorithms efficiently find meaningful results from massive data, built on beautiful mathematics for tackling problems that seem too big to handle head-on.
How can you learn from sensitive data without revealing too much about any individual? Differential privacy enables meaningful analysis while protecting each user’s information, increasingly recognized as a vital tool for responsible, ethical use of data.
How can you harness data to solve real problems? Foundations of data science equips you with techniques that are both reliable and implementable, helping ideas move from concept to application and turning insights into actionable results.
Designing streaming algorithms that remain accurate even when the input stream is chosen adaptively by an adversary. This work focuses on defending against "black box" and "white box" attacks in big data models.
Developing algorithmic frameworks that enable high-utility data analysis while providing rigorous privacy guarantees. This research focuses on the trade-offs between accuracy, computational efficiency, and privacy budget in various differential privacy models.
Developing mathematical frameworks for private and scalable data generation. This research utilizes geometric coresets and algebraic low-rank approximation to construct high-utility synthetic datasets that represent the underlying structure of massive datasets while maintaining rigorous mathematical guarantees.
Honghao Lin, Zhao Song, David P. Woodruff, Shenghao Xie, Samson Zhou, SODA, 2026. Read PDF
Vladimir Braverman, Sumegha Garg, Chen Wang, David P. Woodruff, Samson Zhou, SODA, 2026. Read PDF
Vladimir Braverman, Prathamesh Dharangutte, Shaofeng H.-C. Jiang, Hoai-An Nguyen, Chen Wang, Yubo Zhang, Samson Zhou, ICML, 2025. Read PDF
David P. Woodruff, Shenghao Xie, Samson Zhou, PODS, 2025. Read PDF