Abstract
Though deep learning models have taken on commercial and political relevance,
many aspects of their training and operation remain poorly understood. This has
sparked interest in "science of deep learning" projects, many of which are run
at scale and require enormous amounts of time, money, and electricity. But how
much of this research really needs to occur at scale? In this paper, we
introduce MNIST-1D: a minimalist, low-memory, and low-compute alternative to
classic deep learning benchmarks. The training examples are 20 times smaller
than MNIST examples yet they differentiate more clearly between linear,
nonlinear, and convolutional models which attain 32, 68, and 94% accuracy
respectively (these models obtain 94, 99+, and 99+% on MNIST). Then we present
example use cases which include measuring the spatial inductive biases of
lottery tickets, observing deep double descent, and metalearning an activation
function.
Users
Please
log in to take part in the discussion (add own reviews or comments).