Abstract
One-shot pose estimation for tasks such as body joint localization, camera
pose estimation, and object tracking are generally noisy, and temporal filters
have been extensively used for regularization. One of the most widely-used
methods is the Kalman filter, which is both extremely simple and general.
However, Kalman filters require a motion model and measurement model to be
specified a priori, which burdens the modeler and simultaneously demands that
we use explicit models that are often only crude approximations of reality. For
example, in the pose-estimation tasks mentioned above, it is common to use
motion models that assume constant velocity or constant acceleration, and we
believe that these simplified representations are severely inhibitive. In this
work, we propose to instead learn rich, dynamic representations of the motion
and noise models. In particular, we propose learning these models from data
using long short term memory, which allows representations that depend on all
previous observations and all previous states. We evaluate our method using
three of the most popular pose estimation tasks in computer vision, and in all
cases we obtain state-of-the-art performance.
Description
[1708.01885] Long Short-Term Memory Kalman Filters:Recurrent Neural Estimators for Pose Regularization
Links and resources
Tags
community