Zusammenfassung
We introduce a light-weight, power efficient, and general purpose
convolutional neural network, ESPNetv2, for modeling visual and sequential
data. Our network uses group point-wise and depth-wise dilated separable
convolutions to learn representations from a large effective receptive field
with fewer FLOPs and parameters. The performance of our network is evaluated on
three different tasks: (1) object classification, (2) semantic segmentation,
and (3) language modeling. Experiments on these tasks, including image
classification on the ImageNet and language modeling on the PenTree bank
dataset, demonstrate the superior performance of our method over the
state-of-the-art methods. Our network has better generalization properties than
ShuffleNetv2 when tested on the MSCOCO multi-object classification task and the
Cityscapes urban scene semantic segmentation task. Our experiments show that
ESPNetv2 is much more power efficient than existing state-of-the-art efficient
methods including ShuffleNets and MobileNets. Our code is open-source and
available at <a href="https://github.com/sacmehta/ESPNetv2">this https URL</a>
Nutzer