Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
J. Buolamwini, and T. Gebru. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 81, page 77–91. New York, PMLR, (2018)
Abstract
Recent studies demonstrate that machine learning algorithms can discriminate based on classes like race and gender. In this work, we present an approach to evaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. Using the dermatologist approved Fitzpatrick Skin Type classification system, we characterize the gender and skin type distribution of two facial analysis benchmarks, IJB-A and Adience. We find that these datasets are overwhelmingly composed of lighter-skinned subjects (79.6% for IJB-A and 86.2% for Adience) and introduce a new facial analysis dataset which is balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%). The maximum error rate for lighter-skinned males is 0.8%. The substantial disparities in the accuracy of classifying darker females, lighter females, darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms.
%0 Conference Paper
%1 buolamwini2018gender
%A Buolamwini, Joy
%A Gebru, Timnit
%B Proceedings of the 1st Conference on Fairness, Accountability and Transparency
%C New York
%D 2018
%E Friedler, Sorelle A.
%E Wilson, Christo
%I PMLR
%K artificial_intelligence gender_discrimination machine_learning
%N 81
%P 77–91
%T Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
%U http://proceedings.mlr.press/v81/buolamwini18a.html
%X Recent studies demonstrate that machine learning algorithms can discriminate based on classes like race and gender. In this work, we present an approach to evaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. Using the dermatologist approved Fitzpatrick Skin Type classification system, we characterize the gender and skin type distribution of two facial analysis benchmarks, IJB-A and Adience. We find that these datasets are overwhelmingly composed of lighter-skinned subjects (79.6% for IJB-A and 86.2% for Adience) and introduce a new facial analysis dataset which is balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%). The maximum error rate for lighter-skinned males is 0.8%. The substantial disparities in the accuracy of classifying darker females, lighter females, darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms.
@inproceedings{buolamwini2018gender,
abstract = {Recent studies demonstrate that machine learning algorithms can discriminate based on classes like race and gender. In this work, we present an approach to evaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. Using the dermatologist approved Fitzpatrick Skin Type classification system, we characterize the gender and skin type distribution of two facial analysis benchmarks, IJB-A and Adience. We find that these datasets are overwhelmingly composed of lighter-skinned subjects (79.6% for IJB-A and 86.2% for Adience) and introduce a new facial analysis dataset which is balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%). The maximum error rate for lighter-skinned males is 0.8%. The substantial disparities in the accuracy of classifying darker females, lighter females, darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms. },
added-at = {2020-05-20T14:23:07.000+0200},
address = {New York},
author = {Buolamwini, Joy and Gebru, Timnit},
biburl = {https://www.bibsonomy.org/bibtex/22367d4b88911824a550fdae1f8b086d7/meneteqel},
booktitle = {Proceedings of the 1st Conference on Fairness, Accountability and Transparency},
crossref = {friedler2018proceedings},
editor = {Friedler, Sorelle A. and Wilson, Christo},
interhash = {34a6b892efb2d515cdb29e9bc24efd89},
intrahash = {2367d4b88911824a550fdae1f8b086d7},
keywords = {artificial_intelligence gender_discrimination machine_learning},
language = {eng},
number = 81,
pages = {77–91},
publisher = {PMLR},
series = {Proceedings of Machine Learning Research},
timestamp = {2020-05-22T13:08:42.000+0200},
title = {Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification},
url = {http://proceedings.mlr.press/v81/buolamwini18a.html},
year = 2018
}