Abstract
In this paper we introduce the concept of pseudo-MDPs
to develop abstractions.
Pseudo-MDPs relax the requirement that the transition kernel has to be a probability kernel.
We show that the new framework captures many existing abstractions.
We also introduce the concept of factored linear action models; a special case.
Again, the relation of factored linear action models and existing works are discussed.
We use the general framework to develop a theory for bounding the suboptimality of policies derived from pseudo-MDPs.
Specializing the framework, we recover existing results.
We give a least-squares approach and a constrained optimization approach of learning the factored linear model as well as efficient computation methods.
We demonstrate that the constrained optimization approach gives better performance than the least-squares approach with normalization.
Users
Please
log in to take part in the discussion (add own reviews or comments).