Article,

Predicting the Learning Rate of Gradient Descent for Accelerating Matrix Factorization

, and .
(2013)

Abstract

Matrix Factorization (MF) is the predominant technique in recommender systems. The model parameters are usually learned by means of numerical methods, such as gradient descent. The learning rate of gradient descent is typically set to lower values in order to ensure that the algorithm does not miss a local optimum. As a consequence, the algorithm may take several iterations to converge, which ends up increasing the computational costs of the training phase. Ideally, one wants to find the learning rate that leads to a local optimum in one iteration, but that is very difficult to achieve given the high complexity of the search space. However, if in one shot one gets close enough to the local optimum, one will need only a few iterations until convergence, hence saving iterations and speeding up the learning process. Starting with an exploratory analysis on several recommender systems datasets, we observed that there is an overall linear relationship between the learning rate and the number of iterations needed until convergence. Another key observation is that this relationship holds across the different datasets, with only minor variations. From this, we propose to use simple linear regression models for predicting, for an unknown dataset, which learning rate leads to the minimum number of iterations. We show that, for some datasets, we can reduce the number of iterations up to 50% when compared to the traditional approach.

Tags

Users

  • @lbalby

Comments and Reviews