Zusammenfassung
Temporally extended actions have been proved to enhance the performance of reinforcement learning
agents. The broader framework of ‘Options’ gives us a flexible way of representing such extended course of
action in Markov decision processes. In this work we try to adapt options framework to model an operating
system scheduler, which is expected not to allow processor stay idle if there is any process ready or waiting
for its execution. A process is allowed to utilize CPU resources for a fixed quantum of time (timeslice) and
subsequent context switch leads to considerable overhead. In this work we try to utilize the historical
performances of a scheduler and try to reduce the number of redundant context switches. We propose a
machine-learning module, based on temporally extended reinforcement-learning agent, to predict a better
performing timeslice. We measure the importance of states, in option framework, by evaluating the impact
of their absence and propose an algorithm to identify such checkpoint states. We present empirical
evaluation of our approach in a maze-world navigation and their implications on ädaptive timeslice
parameter" show efficient throughput time.
Nutzer