copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Evaluating the Markov Assumption for Web Usage Mining

S. Jespersen, T. Pedersen, and J. Thorhauge. Proceedings of the 5th ACM International Workshop on Web Information and Data Management, page 82--89. New York, NY, USA, ACM, (2003)
DOI: 10.1145/956699.956717

Abstract

Web usage mining concerns the discovery of common browsing patterns, i.e., pages requested in sequence, from web logs. To cope with the enormous amounts of data, several aggregated structures based on statistical models of web surfing have appeared, e.g., the Hypertext Probabilistic Gramma(HPG) model 2. These techniques typically rely on the Markov assumption with history depth n, i.e., it is assumed that the next requested page is only dependent on the last n pages visited. This is not always valid, i.e. false browsing patterns may be discovered. However, to our knowledge there has been no systematic study of the validity of the Markov assumption wrt. web usage mining and the resulting quality of the mined browsing patterns.In this paper we systematically investigate the quality of browsing patterns mined from structures based on the Markov assumption. Formal measures of quality, based on the closeness of the mined patterns to the true traversal patterns, are defined and an extensive experimental evaluation is performed, based on two substantial real-world data sets. The results indicate that a large number of rules must be considered to achieve high quality, that long rules are generally more distorted than shorter rules and that the model yield knowledge of a higher quality when applied to more random usage patterns. Thus we conclude that Markov-based structures for web usage mining are best suited for tasks demanding less accuracy such as pre-fetching, personalization, and targeted ads.

Description

Evaluating the markov assumption for web usage mining

Links and resources

BibTeX key: jespersen2003evaluating
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of the 5th ACM International Workshop on Web Information and Data Management
year: 2003
pages: 82--89
publisher: ACM
series: WIDM '03
acmid: 956717
isbn: 1-58113-725-7
location: New Orleans, Louisiana, USA
numpages: 8
DOI: 10.1145/956699.956717
url: http://doi.acm.org/10.1145/956699.956717

@becker's tags highlighted

Cite this publication

@inproceedings{jespersen2003evaluating, abstract = {Web usage mining concerns the discovery of common browsing patterns, i.e., pages requested in sequence, from web logs. To cope with the enormous amounts of data, several aggregated structures based on statistical models of web surfing have appeared, e.g., the Hypertext Probabilistic Gramma(HPG) model [2]. These techniques typically rely on the Markov assumption with history depth n, i.e., it is assumed that the next requested page is only dependent on the last n pages visited. This is not always valid, i.e. false browsing patterns may be discovered. However, to our knowledge there has been no systematic study of the validity of the Markov assumption wrt. web usage mining and the resulting quality of the mined browsing patterns.In this paper we systematically investigate the quality of browsing patterns mined from structures based on the Markov assumption. Formal measures of quality, based on the closeness of the mined patterns to the true traversal patterns, are defined and an extensive experimental evaluation is performed, based on two substantial real-world data sets. The results indicate that a large number of rules must be considered to achieve high quality, that long rules are generally more distorted than shorter rules and that the model yield knowledge of a higher quality when applied to more random usage patterns. Thus we conclude that Markov-based structures for web usage mining are best suited for tasks demanding less accuracy such as pre-fetching, personalization, and targeted ads.}, acmid = {956717}, added-at = {2017-01-29T16:31:54.000+0100}, address = {New York, NY, USA}, author = {Jespersen, S{\o}ren and Pedersen, Torben Bach and Thorhauge, Jesper}, biburl = {https://www.bibsonomy.org/bibtex/21736e55530e55ce7569947a14d8d74c1/becker}, booktitle = {Proceedings of the 5th ACM International Workshop on Web Information and Data Management}, description = {Evaluating the markov assumption for web usage mining}, doi = {10.1145/956699.956717}, interhash = {286e9e672104673002a28e471df0a01a}, intrahash = {1736e55530e55ce7569947a14d8d74c1}, isbn = {1-58113-725-7}, keywords = {behavior chain diss inthesis markov model navigation order web}, location = {New Orleans, Louisiana, USA}, numpages = {8}, pages = {82--89}, publisher = {ACM}, series = {WIDM '03}, timestamp = {2017-01-29T16:31:54.000+0100}, title = {Evaluating the Markov Assumption for Web Usage Mining}, url = {http://doi.acm.org/10.1145/956699.956717}, year = 2003 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Evaluating the Markov Assumption for Web Usage Mining

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Evaluating the Markov Assumption for Web Usage Mining

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Evaluating the Markov Assumption for Web Usage Mining

Comments and Reviews
(0)