Abstract
Data have often to be moved between servers and clients during the inference
phase. For instance, modern virtual assistants collect data on mobile devices
and the data are sent to remote servers for the analysis. A related scenario is
that clients have to access and download large amounts of data stored on
servers in order to apply machine learning models. Depending on the available
bandwidth, this data transfer can be a serious bottleneck, which can
significantly limit the application machine learning models. In this work, we
propose a simple yet effective framework that allows to select certain parts of
the input data needed for the subsequent application of a given neural network.
Both the masks as well as the neural network are trained simultaneously such
that a good model performance is achieved while, at the same time, only a
minimal amount of data is selected by the masks. During the inference phase,
only the parts selected by the masks have to be transferred between the server
and the client. Our experimental evaluation indicates that it is, for certain
learning tasks, possible to significantly reduce the amount of data needed to
be transferred without affecting the model performance much.
Users
Please
log in to take part in the discussion (add own reviews or comments).