Abstract
Deep active learning (DAL) seeks to reduce annotation costs
by enabling the model to actively query instance annotations from which
it expects to learn the most. Despite extensive research, there is cur-
rently no standardized evaluation protocol for transformer-based lan-
guage models in the field of DAL. Diverse experimental settings lead to
difficulties in comparing research and deriving recommendations for prac-
titioners. To tackle this challenge, we propose the ActiveGLAE bench-
mark, a comprehensive collection of data sets and evaluation guidelines
for assessing DAL. Our benchmark aims to facilitate and streamline the
evaluation process of novel DAL strategies. Additionally, we provide an
extensive overview of current practice in DAL with transformer-based
language models. We identify three key challenges - data set selection,
model training, and DAL settings - that pose difficulties in comparing
query strategies. We establish baseline results through an extensive set
of experiments as a reference point for evaluating future work. Based on
our findings, we provide guidelines for researchers and practitioners.
Users
Please
log in to take part in the discussion (add own reviews or comments).