Inbook,

Scaling Learning Algorithms towards AI

Y. Bengio, and Y. Lecun.
MIT Press, (2007)

Abstract

One long-term goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), rea- soning, intelligent control, and other artiﬁcially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with min- imal need for prior knowledge, and with minimal human intervention. We present mathematical and empirical evidence suggesting that many popular approaches to non-parametric learning, particularly kernel methods, are fundamentally lim- ited in their ability to learn complex high-dimensional functions. Our analysis focuses on two problems. First, kernel machines are shallow architectures, in which one large layer of simple template matchers is followed by a single layer of trainable coefﬁcients. We argue that shallow architectures can be very inefﬁ- cient in terms of required number of computational elements and examples. Sec- ond, we analyze a limitation of kernel machines with a local kernel, linked to the curse of dimensionality, that applies to supervised, unsupervised (manifold learn- ing) and semi-supervised kernel machines. Using empirical results on invariant image recognition tasks, kernel methods are compared with deep architectures, in which lower-level features or concepts are progressively combined into more ab- stract and higher-level representations. We argue that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artiﬁcial intelligence.

BibTeX key: BengioLecun2007:scaling
entry type: inbook
booktitle: Large-Scale Kernel Machines
year: 2007
publisher: MIT Press
citeulike-article-id: 2604139
priority: 2
posted-at: 2008-03-27 22:50:43

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inbook{BengioLecun2007:scaling, abstract = {One long-term goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), rea- soning, intelligent control, and other artiﬁcially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with min- imal need for prior knowledge, and with minimal human intervention. We present mathematical and empirical evidence suggesting that many popular approaches to non-parametric learning, particularly kernel methods, are fundamentally lim- ited in their ability to learn complex high-dimensional functions. Our analysis focuses on two problems. First, kernel machines are shallow architectures, in which one large layer of simple template matchers is followed by a single layer of trainable coefﬁcients. We argue that shallow architectures can be very inefﬁ- cient in terms of required number of computational elements and examples. Sec- ond, we analyze a limitation of kernel machines with a local kernel, linked to the curse of dimensionality, that applies to supervised, unsupervised (manifold learn- ing) and semi-supervised kernel machines. Using empirical results on invariant image recognition tasks, kernel methods are compared with deep architectures, in which lower-level features or concepts are progressively combined into more ab- stract and higher-level representations. We argue that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artiﬁcial intelligence.}, added-at = {2017-11-21T20:22:32.000+0100}, author = {Bengio, Yoshua and Lecun, Yann}, biburl = {https://www.bibsonomy.org/bibtex/2fb77910fb51014c9be0daebac6355a90/slicside}, booktitle = {Large-Scale Kernel Machines}, citeulike-article-id = {2604139}, description = {Nur für Referenzzwecke verwendet (Bezug darauf, dass viele Hidden-Layer auf wenig komplexe Datensätze, einen negativen Einfluss besitzen. Sektion 3.2)}, editor = {Bottou, L. and Chapelle, O. and Decoste, D. and Weston, J.}, interhash = {baa94c70102873b6e0e49b737a1f6528}, intrahash = {fb77910fb51014c9be0daebac6355a90}, keywords = {dropout final uw_ws17_ml}, posted-at = {2008-03-27 22:50:43}, priority = {2}, publisher = {MIT Press}, timestamp = {2017-12-14T18:09:37.000+0100}, title = {Scaling Learning Algorithms towards AI}, year = 2007 }

BibSonomy

Scaling Learning Algorithms towards AI

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on