Abstract
Deep learning models have been criticized for their lack of easy
interpretation, which undermines confidence in their use for important
applications. Nevertheless, they are consistently utilized in many
applications, consequential to humans' lives, mostly because of their better
performance. Therefore, there is a great need for computational methods that
can explain, audit, and debug such models. Here, we use flip points to
accomplish these goals for deep learning models with continuous output scores
(e.g., computed by softmax), used in social applications. A flip point is any
point that lies on the boundary between two output classes: e.g. for a model
with a binary yes/no output, a flip point is any input that generates equal
scores for "yes" and "no". The flip point closest to a given input is of
particular importance because it reveals the least changes in the input that
would change a model's classification, and we show that it is the solution to a
well-posed optimization problem. Flip points also enable us to systematically
study the decision boundaries of a deep learning classifier. The resulting
insight into the decision boundaries of a deep model can clearly explain the
model's output on the individual-level, via an explanation report that is
understandable by non-experts. We also develop a procedure to understand and
audit model behavior towards groups of people. Flip points can also be used to
alter the decision boundaries in order to improve undesirable behaviors. We
demonstrate our methods by investigating several models trained on standard
datasets used in social applications of machine learning. We also identify the
features that are most responsible for particular classifications and
misclassifications.
Description
[2001.00682] Auditing and Debugging Deep Learning Models via Decision Boundaries: Individual-level and Group-level Analysis
Links and resources
Tags