Abstract
In the last year, new models and methods for pretraining and transfer
learning have driven striking performance improvements across a range of
language understanding tasks. The GLUE benchmark, introduced a little over one
year ago, offers a single-number metric that summarizes progress on a diverse
set of such tasks, but performance on the benchmark has recently surpassed the
level of non-expert humans, suggesting limited headroom for further research.
In this paper we present SuperGLUE, a new benchmark styled after GLUE with a
new set of more difficult language understanding tasks, a software toolkit, and
a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.
Users
Please
log in to take part in the discussion (add own reviews or comments).