Donders Institute for Brain, Cognition and Behaviour
Zoek in de site...

Thesis defense Mohammad Azar (Donders Series 99)

October 25, 2012.

Promotor Prof.dr. H.J. Kappen, copromotor Dr. R. Munos

On the theory of reinforcement learning: methods, convergence analysis and sample complexity

The focus of this thesis is on the problem of reinforcement learning (RL) in the infinite-horizon discounted-reward Markovian decision processes (MDPs). In particular, we concentrate on the problem of estimating the optimal policy and the optimal value function by sampling: we develop new RL algorithms and analyze their asymptotic and finite-time performances. In addition, we refine some of the existing theoretical results for the well-known RL algorithms such as model-based value iteration and policy iteration. Further, we prove some general lower bounds on the number of samples required by any RL algorithm to achieve a near optimal solution (sample complexity bound).