Fair Synthetic Data Does not Necessarily Lead to Fair Models

Dec 2, 2022

Speakers

About

The Wasserstein GAN (WGAN) is a well-established model allowing for the generation of high-quality synthetic data approximating a given real dataset. We study TabFairGAN, a known tabular variation of WGAN in which a custom penalty term is added to the generator's loss, forcing it to produce fair data. Here we measure the fairness of synthetic data using demographic parity, i.e., the gap in the proportions of positive outcome between different sensitive groups. We reproduce some results from the paper and highlight empirically the fact that although the synthetic data achieves low demographic parity, a classification model trained on said data and evaluated on real data may still output predictions that achieve high demographic parity – hence is unfair. In particular, we show empirically this gap holds for most parts spectrum of the fairness-accuracy tradeoff, besides the large-penalty case where the model mode collapses to the most frequent target outcome, and the low-penalty case where the data is not constrained to be fair.

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2022