Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems
Degris, Thomas
LIP6 - Pôle IA
Université Paris 6
8, rue du capitaine Scott F-75015 Paris, France
email: thomas.degris@lip6.fr
Sigaud, Olivier
LIP6 - Pôle IA
Université Paris 6
8, rue du capitaine Scott F-75015 Paris, France
email: olivier.sigaud@lip6.fr
Wuillemin, Pierre-Henri
LIP6 - Pôle IA
Université Paris 6
8, rue du capitaine Scott F-75015 Paris, France
email: pierre-henri.wuillemin@lip6.fr home: www-desir.lip6.fr/~phw
Degris, Thomas and Sigaud, Olivier and Wuillemin, Pierre-Henri (2006) "Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems". In Proceedings of the 23rd International Conference on Machine Learning, pp. 257--264, ACM, Pittsburgh, Pennsylvania.
Recent decision-theoric planning algorithms are able to find optimal solutions in large problems, using Factored Markov Decision Processes (FMDPs). However, these algorithms need a perfect knowledge of the structure of the problem. In this paper, we propose sdyna, a general framework for addressing large reinforcement learning problems by trial-and-error and with no initial knowledge of their structure. sdyna integrates incremental planning algorithms based on FMDPs with supervised learning techniques building structured representations of the problem. We describe spiti, an instantiation of sdyna, that uses incremental decision tree induction to learn the structure of a problem combined with an incremental version of the Structured Value Iteration algorithm. We
show that spiti can build a factored representation of a reinforcement learning problem and may improve the policy faster than tabular reinforcement learning algorithms by exploiting the generalization property of decision tree induction algorithms.
@InProceedings{,
author = {Degris, Thomas and Sigaud, Olivier and Wuillemin, Pierre-Henri},
title = {Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems},
booktitle = {Proceedings of the 23rd International Conference on Machine Learning},
year = {2006},
pages = {257--264},
address = {Pittsburgh, Pennsylvania},
publisher = {ACM}
}
DegrisICML06.pdf