Inproceedings,

Affinity Clustering Framework for Data Debiasing Using Pairwise Distribution Discrepancy

S. Ghodsi, and E. Ntoutsi.
Proceedings of the 2nd European Workshop on Algorithmic Fairness, Winterthur, Switzerland, June 7th to 9th, 2023, volume 3442 of CEUR Workshop Proceedings, CEUR-WS.org, (June 2023)

Full text

Abstract

Group imbalance usually caused by insufficient or unrepresentative data collection procedures, is among the main reasons for the emergence of representation bias in datasets. Representation bias can exist with respect to different groups of one or more protected attributes and might lead to prejudicial and discriminatory outcomes toward certain groups of individuals; in case if a learning model is trained on such biased data. In this paper, we propose MASC a data augmentation approach based on affinity clustering of existing data in similar datasets. An arbitrary target dataset utilizes protected group instances of other neighboring datasets that locate in the same cluster, in order to balance out the cardinality of its nonprotected and protected groups. To form clusters where datasets can share instances for protected-group augmentation, an affinity clustering pipeline is developed based on an affinity matrix. The formation of the affinity matrix relies on computing the discrepancy of distributions between each pair of datasets and translating these discrepancies into a symmetric pairwise similarity matrix. Furthermore, a non-parametric spectral clustering is applied to the affinity matrix and the corresponding datasets are categorized into an optimal number of clusters automatically. We perform a step-by-step experiment as a demo of our method to both show the procedure of the proposed data augmentation method and also to evaluate and discuss its performance. In addition, a comparison to other data augmentation methods before and after the augmentations are provided as well as model evaluation performance analysis of each of the competitors compared to our method. In our experiments, bias is measured in a non-binary protected attribute setup w.r.t. racial groups distribution for two separate minority groups in comparison with the majority group before and after debiasing. Empirical results imply that our method of augmenting dataset biases using real (genuine) data from similar contexts can effectively debias the target datasets comparably to existing data augmentation strategies. Keywords: Distribution Shift, Affinity Clustering, Bias & Fairness, Maximum Mean Discrepancy, Data Debiasing, Data augmentation

BibTeX key: Ghodsi_EWAF
entry type: inproceedings
booktitle: Proceedings of the 2nd European Workshop on Algorithmic Fairness, Winterthur, Switzerland, June 7th to 9th, 2023
year: 2023
month: June
publisher: CEUR-WS.org
series: CEUR Workshop Proceedings
volume: 3442
venue: Winterthur, Switzerland
eventdate: 7-9 June, 2023
eventtitle: 2nd European Workshop on Algorithmic Fairness (EWAF 2023)
bibsource: dblp computer science bibliography, https://dblp.org
Document: https://ceur-ws.org/Vol-3442/paper-10.pdf

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Conference Paper %1 Ghodsi_EWAF %A Ghodsi, Siamak %A Ntoutsi, Eirini %B Proceedings of the 2nd European Workshop on Algorithmic Fairness, Winterthur, Switzerland, June 7th to 9th, 2023 %D 2023 %E Alvarez, Jose M. %E Fabris, Alessandro %E Heitz, Christoph %E Hertweck, Corinna %E Loi, Michele %E Zehlike, Meike %I CEUR-WS.org %K 2023 aiml_group l3s myown responsibleai %T Affinity Clustering Framework for Data Debiasing Using Pairwise Distribution Discrepancy %U https://ceur-ws.org/Vol-3442/paper-10.pdf %V 3442 %X Group imbalance usually caused by insufficient or unrepresentative data collection procedures, is among the main reasons for the emergence of representation bias in datasets. Representation bias can exist with respect to different groups of one or more protected attributes and might lead to prejudicial and discriminatory outcomes toward certain groups of individuals; in case if a learning model is trained on such biased data. In this paper, we propose MASC a data augmentation approach based on affinity clustering of existing data in similar datasets. An arbitrary target dataset utilizes protected group instances of other neighboring datasets that locate in the same cluster, in order to balance out the cardinality of its nonprotected and protected groups. To form clusters where datasets can share instances for protected-group augmentation, an affinity clustering pipeline is developed based on an affinity matrix. The formation of the affinity matrix relies on computing the discrepancy of distributions between each pair of datasets and translating these discrepancies into a symmetric pairwise similarity matrix. Furthermore, a non-parametric spectral clustering is applied to the affinity matrix and the corresponding datasets are categorized into an optimal number of clusters automatically. We perform a step-by-step experiment as a demo of our method to both show the procedure of the proposed data augmentation method and also to evaluate and discuss its performance. In addition, a comparison to other data augmentation methods before and after the augmentations are provided as well as model evaluation performance analysis of each of the competitors compared to our method. In our experiments, bias is measured in a non-binary protected attribute setup w.r.t. racial groups distribution for two separate minority groups in comparison with the majority group before and after debiasing. Empirical results imply that our method of augmenting dataset biases using real (genuine) data from similar contexts can effectively debias the target datasets comparably to existing data augmentation strategies. Keywords: Distribution Shift, Affinity Clustering, Bias & Fairness, Maximum Mean Discrepancy, Data Debiasing, Data augmentation

BibSonomy

Affinity Clustering Framework for Data Debiasing Using Pairwise Distribution Discrepancy

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on