Exploring the Future of Fairness and Privacy with Synthetic Data: Insights from Our Workshop
Last Tuesday the Network Institute had the pleasure of hosting a Network Institute-awarded workshop in collaboration with Inclusive AI lab which brought together academics, industry professionals and public institutions representative in three thematic sessions:
- Ethical and Responsible Applications
- Debiasing Data & Assessing Fairness
- Synthetic Data in Context: Focus on Healthcare
First of all, we would like to extend a heartfelt thank you to the people that made this event possible: Marianna Capasso and Payal Arora, together with computer scientist Emma Beauxis-Aussalet and sociologist Renate Baumgartner at Vrije Universiteit Amsterdam (VU Amsterdam).
And a special thanks to Weijie Huang for taking pictures and supporting the event!
Summary of the discussion
The stakeholders explored practical use cases and the legal, ethical, and social challenges related to synthetic data generation and use. Some actionable insights are:
- Synthetic data is here to stay but requires ongoing, standardized monitoring to optimize its responsible use. While synthetic data generation has improved significantly—Gartner predicts synthetic data will overshadow real data in AI models by 2030—it still comes with quality and ethical risks. Thorough evaluation, including developing metrics for quality and privacy-utility trade-offs, remains crucial.
- No one-size-fits-all solution exists for synthetic datasets. Context and domain knowledge are essential.
- Synthetic data is not a ‘technical fix’ or a cure-all. It is not neutral. Broader, global societal perspectives are needed to address its challenges and promises.
- We need to ask the right questions to navigate this evolving landscape. Who controls synthetic data generation tools and platforms? How can we foster substantive fairness and data justice?
- Ensure data reflects contemporary social values of inclusion, sustainability, and responsible innovation. Without this, we risk phenomena like model collapse—where quality and diversity degrade over generations of synthetic data.