Abstract

We study the problem of nding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. A point x is de ned to be a -outlier if there exists some direction w in which its squared distance from the mean along w is greater than times the average squared distance from the mean along w1. Our main theorem is that for any  > 0, there exists a (1 ) fraction of the original distribution that has no O( n  (b+log n  ))-outliers, improving on the previous bound of O(n 7 b=). This bound is shown to b e nearly the best possible. The theorem is constructive, and results in a 1 1 approximation to the following optimization problem: given a distribution  (i.e. the ability to sample from it), and a parameter  > 0, nd the minimum for which there exists a subset of probability at least (1 ) with no -outliers.

Links and resources

Tags

community

  • @dblp
  • @ytyoun
@ytyoun's tags highlighted