M. Gasić. Department of Engineering, University of Cambridge, Cambridge, UK, PhD thesis, (January 2011)
The partially observable Markov decision process (POMDP) has been proposed as a model for dialogue which is able to provide increased robustness to errors in understanding of speech, automatically optimise dialogue management behaviour and be amenable to adaptation for different user types. The POMDP-based approach to dialogue management maintains a distribution over every possible dialogue state, the belief state. Based on that distribution the system chooses the action that gives the highest expected reward, where the reward provides a measure of how good the dialogue is. The primary challenge, however, with the POMDP-based approach is the intractability of both maintaining the belief state and of optimising action selection. The Hidden Information State framework is a practical framework for building dialogue managers based on the POMDP approach. It achieves tractability by grouping the possible user goals into equivalence classes which then ensures that the belief state can be maintained tractably. It optimises the dialogue policy in a much reduced belief state space, the summary space. In this thesis, a more efficient state representation is presented which includes the representation of logical complements of concepts in the user request. On the one hand, the representation supports more complex dialogues that include logical expressions. On the other hand, it enables a pruning technique to be implemented which is able to place a bound on the space. Thus, no limit is required on the length of the dialogue or on the number of different hypotheses that are received from the speech understanding module. More importantly, this enables building real-world dialogue systems with large domains. This thesis also examines the potential for improving the action selection. Firstly, the problem of optimising action selection in the summary space is examined. A method is then proposed that guarantees selection of optimal back-off actions in the case when the selected action cannot be mapped back to the original belief state space. Secondly, this thesis investigates the use of Gaussian processes to approximate the highest expected reward that can be obtained for every belief state and system action. Approximating the function with a Gaussian process provides a posterior distribution of the function values given the prior distribution and some observations. It is shown here that an adequate prior speeds up the optimisation of action selection. The posterior also provides an estimate of the uncertainty, which enables rapid adaptation to different user profiles. Overall, the methods proposed in this thesis make steps towards more flexible real-world spoken dialogue systems.