Thesis defense Mohammad Azar (Donders Series 99)
October 25, 2012.
Promotor Prof.dr. H.J. Kappen, copromotor Dr. R. Munos
On the theory of reinforcement learning: methods, convergence analysis and sample complexity
The focus of this thesis is on the problem of reinforcement learning (RL) in the infinite-horizon discounted-reward Markovian decision processes (MDPs). In particular, we concentrate on the problem of estimating the optimal policy and the optimal value function by sampling: we develop new RL algorithms and analyze their asymptotic and finite-time performances. In addition, we refine some of the existing theoretical results for the well-known RL algorithms such as model-based value iteration and policy iteration. Further, we prove some general lower bounds on the number of samples required by any RL algorithm to achieve a near optimal solution (sample complexity bound).