Zusammenfassung
The desire to run neural networks on low-capacity edge devices has led to the
development of a wealth of compression techniques. Moonshine is a simple and
powerful example of this: one takes a large pre-trained network and substitutes
each of its convolutional blocks with a selected cheap alternative block, then
distills the resultant network with the original. However, not all blocks are
created equally; for a required parameter budget there may exist a potent
combination of many different cheap blocks. In this work, we find these by
developing BlockSwap: an algorithm for choosing networks with interleaved block
types by passing a single minibatch of training data through randomly
initialised networks and gauging their Fisher potential. We show that
block-wise cheapening yields more accurate networks than single block-type
networks across a spectrum of parameter budgets. Code is available at
https://github.com/BayesWatch/pytorch-blockswap.
Nutzer