Abstract
Sequential traces of user data are frequently observed online and offline,
e.g.,as sequences of visited websites or as sequences of locations captured by
GPS. However,understanding factors explaining the production of sequence data
is a challenging task,especially since the data generation is often not
homogeneous. For example, navigation behavior might change in different phases
of a website visit, or movement behavior may vary between groups of user. In
this work, we tackle this task and propose MixedTrails, a Bayesian approach for
comparing the plausibility of hypotheses regarding the generative processes of
heterogeneous sequence data. Each hypothesis represents a belief about
transition probabilities between a set of states that can vary between groups
of observed transitions.For example, when trying to understand human movement
in a city, a hypothesis assuming tourists to be more likely to move towards
points of interests than locals, can be shown to be more plausible with
observed data than a hypothesis assuming the opposite. Our approach
incorporates these beliefs as Bayesian priors in a generative mixed transition
Markov chain model, and compares their plausibility utilizing Bayes factors. We
discuss analytical and approximate inference methods for calculating the
marginal likelihoods for Bayes factors,give guidance on interpreting the
results, and illustrate our approach with several experiments on synthetic and
empirical data from Wikipedia and Flickr. Thus, this work enables a novel kind
of analysis for studying sequential data in many application areas.
Description
MixedTrails: Bayesian Hypotheses Comparison on Heterogeneous Sequential Data
Links and resources
Tags
community