Article,

Predicting Web Users' Next Access Based on Log Data

, and .
Journal of Computational and Graphical Statistics, 12 (1): 143-155 (2003)
DOI: 10.1198/1061860031275

Abstract

This article considers models that describe how people browse the Web. We restrict our attention to navigation patterns within a single site, and base our study on standard Web server access logs. Given a visitor's previous activities on the site, we propose models that predict their next page request. If the prediction is reasonably accurate, we might consider “prefetching” the page before the visitor requests it. A more conservative use for such predictions would be to simply update the freshness records in a proxy or network cache, eliminating unnecessary If-Modified-Since requests. Using data from the Web site for the Computing and Mathematical Sciences Research Division of Lucent Technologies (cm.bell-labs.com) we first evaluate the predictive performance of low-order Markov models. We next consider mixtures of first-order Markov models, achieving a kind of clustering of Web pages in the site. This approach is shown to perform well, while significantly reducing the space required to store the model. Finally, we explore a Bayesian approach using a Dirichlet prior on the collection of links available to a user at each stage in their travels through the site. We show that the posterior probabilities derived under this model are fairly close to the cross-validation estimates of the probability of success.

Tags

Users

  • @becker
  • @mbockholt

Comments and Reviews