Abstract
The subspace approximation problem with outliers, for given $n$ points in $d$
dimensions $x_1,łdots, x_n R^d$, an integer $1 k d$, and
an outlier parameter $0 1$, is to find a $k$-dimensional
linear subspace of $R^d$ that minimizes the sum of squared distances to its
nearest $(1-\alpha)n$ points. More generally, the $\ell_p$ subspace
approximation problem with outliers minimizes the sum of $p$-th powers of
distances instead of the sum of squared distances. Even the case of robust PCA
is non-trivial, and previous work requires additional assumptions on the input.
Any multiplicative approximation algorithm for the subspace approximation
problem with outliers must solve the robust subspace recovery problem, a
special case in which the $(1-\alpha)n$ inliers in the optimal solution are
promised to lie exactly on a $k$-dimensional linear subspace. However, robust
subspace recovery is Small Set Expansion (SSE)-hard.
We show how to extend dimension reduction techniques and bi-criteria
approximations based on sampling to the problem of subspace approximation with
outliers. To get around the SSE-hardness of robust subspace recovery, we assume
that the squared distance error of the optimal $k$-dimensional subspace summed
over the optimal $(1-\alpha)n$ inliers is at least $\delta$ times its
squared-error summed over all $n$ points, for some $0 < 1 -
\alpha$. With this assumption, we give an efficient algorithm to find a subset
of $poly(k/\epsilon) łog(1/\delta) łogłog(1/\delta)$ points whose span
contains a $k$-dimensional subspace that gives a multiplicative
$(1+\epsilon)$-approximation to the optimal solution. The running time of our
algorithm is linear in $n$ and $d$. Interestingly, our results hold even when
the fraction of outliers $\alpha$ is large, as long as the obvious condition $0
< 1 - \alpha$ is satisfied.
Description
[2006.16573] Subspace approximation with outliers
Links and resources
Tags
community