Article,

On the mathematical foundations of learning

, and .
Bulletin of the American Mathematical Society, (2002)

Abstract

In this report the authors study the relationship between approximation and learning, emphasizing the primary role of sampling (inductive inference). Tools and ideas from linear algebra, probability theory and numerical analysis (least squares algorithm) are widely used. As the authors say, "practical results are not the goal of this paper. Understanding is." With this purpose, they illustrate their statements with several examples. The main setting is the following: the existence of an ünknown" function $f\,X Y$ and a probability measure $\rho$ allowing one to randomly draw points in $X Y$ is assumed, such that for $x X$ the expected value of a randomly drawn point $y Y$ is $f(x)$. The (least squares) error of $f$ is defined by $$E(f)=ınt_X Y (f(x)-y)^2 d\rho.$$ The main goal is to find $f$ minimizing $E(f)$, where $f$ is taken from a subfamily $H$ of continuous functions with the sup-norm on $X$. Along with the optimizer $f_H$ of the problem above, the empirical target function $f_z$ is considered, minimizing the discrete least squares error for a given sample $z (X Y)^m$. Then the error $E(f_z)$ decomposes as a sum of the sample error (which measures how good $f_z$ is as an approximation of $f_H$), studied in detail in Chapter I, and the approximation error, which depends only on $H$ and $\rho$ (Chapter II). Several estimations on both errors in terms of covering numbers are provided. Also, the bias-variance trade-off problem is discussed (for larger families $H$ the approximation errors decrease, but sample errors increase, and vice-versa). Chapter III is devoted to the study of a particular choice of the families $H$, which allows the analysis of algorithms from the point of view of reproducing kernel Hilbert spaces.

Tags

Users

  • @kzhou

Comments and Reviews