Abstract
We show that the gradient descent algorithm provides an implicit
regularization effect in the learning of over-parameterized matrix
factorization models and one-hidden-layer neural networks with quadratic
activations. Concretely, we show that given $O(dr^2)$ random linear
measurements of a rank $r$ positive semidefinite matrix $X^\star$, we can
recover $X^\star$ by parameterizing it by $UU^\top$ with $U\mathbb
R^dd$ and minimizing the squared loss, even if $r d$. We prove
that starting from a small initialization, gradient descent recovers
$X^\star$ in $O(r)$ iterations approximately. The results
solve the conjecture of Gunasekar et al.'17 under the restricted isometry
property. The technique can be applied to analyzing neural networks with
one-hidden-layer quadratic activations with some technical modifications.
Users
Please
log in to take part in the discussion (add own reviews or comments).