Research Group

Focus Areas
Data Stream Icon

Sublinear Algorithms

How can you understand a dataset too big to store -- or even read -- completely? Sublinear algorithms efficiently find meaningful results from massive data, built on beautiful mathematics for tackling problems that seem too big to handle head-on.

Differential Privacy Icon

Differential Privacy

How can you learn from sensitive data without revealing too much about any individual? Differential privacy enables meaningful analysis while protecting each user’s information, increasingly recognized as a vital tool for responsible, ethical use of data.

Foundations of Data Science Icon

Foundations of Data Science

How can you harness data to solve real problems? Foundations of data science equips you with techniques that are both reliable and implementable, helping ideas move from concept to application and turning insights into actionable results.

Group Members
Masters Student
Member Photo

Shenghao Xie

2025-2026 Outstanding Engineering MS Graduate Student Award
Undergrad Student
Member Photo

Neeraj Gogate

Undergrad Student
Member Photo

Alisa Lu

Undergrad Student
Member Photo

Soham Nagawanshi

Undergrad Student
Member Photo

Alan Zhou

Alumni
Alumni
Chen Wang

Chen Wang

Postdoc (2024-2025), co-hosted with Vladimir Braverman

Next Stop: Assistant Professor at Rensselaer Polytechnic Institute (RPI).
Recent Projects

Adversarial Robustness in the Streaming Model

Designing streaming algorithms that remain accurate even when the input stream is chosen adaptively by an adversary. This work focuses on defending against "black box" and "white box" attacks in big data models.

Media Coverage: Texas A&M Engineering News CMU School of Computer Science News
Supported by NSF

Differential Privacy

Developing algorithmic frameworks that enable high-utility data analysis while providing rigorous privacy guarantees. This research focuses on the trade-offs between accuracy, computational efficiency, and privacy budget in various differential privacy models.

Supported by Apple

Efficient Synthetic Data Generation

Developing mathematical frameworks for private and scalable data generation. This research utilizes geometric coresets and algebraic low-rank approximation to construct high-utility synthetic datasets that represent the underlying structure of massive datasets while maintaining rigorous mathematical guarantees.

Supported by Oak Ridge National Laboratory
Featured Publications

Lp Sampling in Distributed Data Streams with Applications to Adversarial Robustness

Honghao Lin, Zhao Song, David P. Woodruff, Shenghao Xie, Samson Zhou, SODA, 2026. Read PDF

Online Learning with Limited Information in the Sliding Window Model

Vladimir Braverman, Sumegha Garg, Chen Wang, David P. Woodruff, Samson Zhou, SODA, 2026. Read PDF

Relative Error Fair Clustering in the Weak-Strong Oracle Model

Vladimir Braverman, Prathamesh Dharangutte, Shaofeng H.-C. Jiang, Hoai-An Nguyen, Chen Wang, Yubo Zhang, Samson Zhou, ICML, 2025. Read PDF

Perfect Sampling in Turnstile Streams Beyond Small Moments

David P. Woodruff, Shenghao Xie, Samson Zhou, PODS, 2025. Read PDF

Group Life & Events