Abstract
In this work, the problem of predicting dropout risk in undergraduate studies
is addressed from a perspective of algorithmic fairness. We develop a machine
learning method to predict the risks of university dropout and
underperformance. The objective is to understand if such a system can identify
students at risk while avoiding potential discriminatory biases. When modeling
both risks, we obtain prediction models with an Area Under the ROC Curve (AUC)
of 0.77-0.78 based on the data available at the enrollment time, before the
first year of studies starts. This data includes the students' demographics,
the high school they attended, and their admission (average) grade. Our models
are calibrated: they produce estimated probabilities for each risk, not mere
scores. We analyze if this method leads to discriminatory outcomes for some
sensitive groups in terms of prediction accuracy (AUC) and error rates
(Generalized False Positive Rate, GFPR, or Generalized False Negative Rate,
GFNR). The models exhibit some equity in terms of AUC and GFNR along groups.
The similar GFNR means a similar probability of failing to detect risk for
students who drop out. The disparities in GFPR are addressed through a
mitigation process that does not affect the calibration of the model.
Users
Please
log in to take part in the discussion (add own reviews or comments).