-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Labels
questionGeneral question about the softwareGeneral question about the software
Description
Environment details
- SDMetrics version: 0.21.0
Description
We have an upcoming fairness metric called EqualizedOddsImprovment. This is meant to indicate whether the synthetic data is improving the fairness (as compared to the real data).
When it comes to fairness/de-biasing metrics, we should figure out if the expectation should be data augmentation or data replacement:
- Data augmentation means that you're expecting to add the synthetic data to the real data. So the metric should actually be comparing real data vs. real + synthetic data.
- This would make sense if, for example, you want to ultimately use the data for training an ML model. (You wouldn't want to get rid of the real data in this case, but rather add to it.)
- Data replacement means that you're expecting to use the synthetic data in place of the real data. So the metric should actually be comparing real data vs. synthetic data only.
- This would make sense if, for example, you want to ultimately share the synthetic data externally. (You would want to entirely replace the real data with synthetic data.)
Currently, the metric is designed with the replacement strategy in mind: It computes the equalized odds on the real data, and then on the synthetic data. It then returns the difference between the two computations.
Additional Context
For the data augmentation case, see #779 for options in usage
Metadata
Metadata
Assignees
Labels
questionGeneral question about the softwareGeneral question about the software