H. Lin, and M. Tegmark. (2016)cite arxiv:1608.08225Comment: 14 pages, 3 figs.
Abstract
We show how the success of deep learning depends not only on mathematics but
also on physics: although well-known mathematical theorems guarantee that
neural networks can approximate arbitrary functions well, the class of
functions of practical interest can be approximated through "cheap learning"
with exponentially fewer parameters than generic ones, because they have
simplifying properties tracing back to the laws of physics. The exceptional
simplicity of physics-based functions hinges on properties such as symmetry,
locality, compositionality and polynomial log-probability, and we explore how
these properties translate into exceptionally simple neural networks
approximating both natural phenomena such as images and abstract
representations thereof such as drawings. We further argue that when the
statistical process generating the data is of a certain hierarchical form
prevalent in physics and machine-learning, a deep neural network can be more
efficient than a shallow one. We formalize these claims using information
theory and discuss the relation to renormalization group procedures. Various
"no-flattening theorems" show when these efficient deep networks cannot be
accurately approximated by shallow ones without efficiency loss - even for
linear networks.
Description
[1608.08225] Why does deep and cheap learning work so well?
%0 Generic
%1 lin2016cheap
%A Lin, Henry W.
%A Tegmark, Max
%D 2016
%K deeplearning why work
%T Why does deep and cheap learning work so well?
%U http://arxiv.org/abs/1608.08225
%X We show how the success of deep learning depends not only on mathematics but
also on physics: although well-known mathematical theorems guarantee that
neural networks can approximate arbitrary functions well, the class of
functions of practical interest can be approximated through "cheap learning"
with exponentially fewer parameters than generic ones, because they have
simplifying properties tracing back to the laws of physics. The exceptional
simplicity of physics-based functions hinges on properties such as symmetry,
locality, compositionality and polynomial log-probability, and we explore how
these properties translate into exceptionally simple neural networks
approximating both natural phenomena such as images and abstract
representations thereof such as drawings. We further argue that when the
statistical process generating the data is of a certain hierarchical form
prevalent in physics and machine-learning, a deep neural network can be more
efficient than a shallow one. We formalize these claims using information
theory and discuss the relation to renormalization group procedures. Various
"no-flattening theorems" show when these efficient deep networks cannot be
accurately approximated by shallow ones without efficiency loss - even for
linear networks.
@misc{lin2016cheap,
abstract = {We show how the success of deep learning depends not only on mathematics but
also on physics: although well-known mathematical theorems guarantee that
neural networks can approximate arbitrary functions well, the class of
functions of practical interest can be approximated through "cheap learning"
with exponentially fewer parameters than generic ones, because they have
simplifying properties tracing back to the laws of physics. The exceptional
simplicity of physics-based functions hinges on properties such as symmetry,
locality, compositionality and polynomial log-probability, and we explore how
these properties translate into exceptionally simple neural networks
approximating both natural phenomena such as images and abstract
representations thereof such as drawings. We further argue that when the
statistical process generating the data is of a certain hierarchical form
prevalent in physics and machine-learning, a deep neural network can be more
efficient than a shallow one. We formalize these claims using information
theory and discuss the relation to renormalization group procedures. Various
"no-flattening theorems" show when these efficient deep networks cannot be
accurately approximated by shallow ones without efficiency loss - even for
linear networks.},
added-at = {2016-09-05T07:35:09.000+0200},
author = {Lin, Henry W. and Tegmark, Max},
biburl = {https://www.bibsonomy.org/bibtex/293827d3423151e0cc4b320c523d47b77/thoni},
description = {[1608.08225] Why does deep and cheap learning work so well?},
interhash = {a6204cfb8c159bd83e32cdadf7468f56},
intrahash = {93827d3423151e0cc4b320c523d47b77},
keywords = {deeplearning why work},
note = {cite arxiv:1608.08225Comment: 14 pages, 3 figs},
timestamp = {2016-11-02T06:50:19.000+0100},
title = {Why does deep and cheap learning work so well?},
url = {http://arxiv.org/abs/1608.08225},
year = 2016
}