Skip to content

Should fairness metrics (aka de-biasing metrics) assume you're doing data augmentation? #780

@npatki

Description

@npatki

Environment details

  • SDMetrics version: 0.21.0

Description

We have an upcoming fairness metric called EqualizedOddsImprovment. This is meant to indicate whether the synthetic data is improving the fairness (as compared to the real data).

When it comes to fairness/de-biasing metrics, we should figure out if the expectation should be data augmentation or data replacement:

  1. Data augmentation means that you're expecting to add the synthetic data to the real data. So the metric should actually be comparing real data vs. real + synthetic data.
    • This would make sense if, for example, you want to ultimately use the data for training an ML model. (You wouldn't want to get rid of the real data in this case, but rather add to it.)
  2. Data replacement means that you're expecting to use the synthetic data in place of the real data. So the metric should actually be comparing real data vs. synthetic data only.
    • This would make sense if, for example, you want to ultimately share the synthetic data externally. (You would want to entirely replace the real data with synthetic data.)

Currently, the metric is designed with the replacement strategy in mind: It computes the equalized odds on the real data, and then on the synthetic data. It then returns the difference between the two computations.

Additional Context

For the data augmentation case, see #779 for options in usage

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionGeneral question about the software

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions