@becker

Evaluating the Markov Assumption for Web Usage Mining

, , and . Proceedings of the 5th ACM International Workshop on Web Information and Data Management, page 82--89. New York, NY, USA, ACM, (2003)
DOI: 10.1145/956699.956717

Abstract

Web usage mining concerns the discovery of common browsing patterns, i.e., pages requested in sequence, from web logs. To cope with the enormous amounts of data, several aggregated structures based on statistical models of web surfing have appeared, e.g., the Hypertext Probabilistic Gramma(HPG) model 2. These techniques typically rely on the Markov assumption with history depth n, i.e., it is assumed that the next requested page is only dependent on the last n pages visited. This is not always valid, i.e. false browsing patterns may be discovered. However, to our knowledge there has been no systematic study of the validity of the Markov assumption wrt. web usage mining and the resulting quality of the mined browsing patterns.In this paper we systematically investigate the quality of browsing patterns mined from structures based on the Markov assumption. Formal measures of quality, based on the closeness of the mined patterns to the true traversal patterns, are defined and an extensive experimental evaluation is performed, based on two substantial real-world data sets. The results indicate that a large number of rules must be considered to achieve high quality, that long rules are generally more distorted than shorter rules and that the model yield knowledge of a higher quality when applied to more random usage patterns. Thus we conclude that Markov-based structures for web usage mining are best suited for tasks demanding less accuracy such as pre-fetching, personalization, and targeted ads.

Description

Evaluating the markov assumption for web usage mining

Links and resources

Tags

community

  • @becker
  • @chato
  • @dblp
@becker's tags highlighted