Abstract
Many read-world datasets can be characterized as
follows: the "extrinsic dimension" of the data is
high, but the "intrinsic dimension" is low.
Consider for example the data generated by a motion
capture device. Such a device typically tracks a few
hundred dots located on a special suit worn by the
tracked person. Each time point corresponds to a vector
consisting of the (x,y,z) location of each dot. The
extrinsic dimension of these vectors is thus around one
thousand. However, the vectors are highly constrained
because the dots are placed on a human body that has
only a limited number of degrees of freedom. We say
that the "intrinsic dimension" is the number of the
degrees of freedom of the data. We are interested in
learning algorithms whose performance scales with the
intrinsic dimension of the data. We present the random
projection trees algorithm which has this type of
performance. Moreover, the algorithm is very efficient
computationally and can be performed in a streaming
fashion where each data point is seen only once. This
is joint work with Sanjoy Dasgupta.
Users
Please
log in to take part in the discussion (add own reviews or comments).