Optimizing Long-term Predictions for Model-based Policy Search

2017

Conference Paper

am

ics

We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment.

Author(s):	Andreas Doerr and Christian Daniel and Duy Nguyen-Tuong and Alonso Marco and Stefan Schaal and Marc Toussaint and Sebastian Trimpe
Book Title:	Proceedings of 1st Annual Conference on Robot Learning (CoRL)
Volume:	78
Pages:	227-238
Year:	2017
Month:	November
Editors:	Sergey Levine and Vincent Vanhoucke and Ken Goldberg

Department(s):	Autonome Motorik, Intelligent Control Systems
Research Project(s):	Learning Probabilistic Dynamics Models
Bibtex Type:	Conference Paper (conference)
Paper Type:	Conference

Event Name:	1st Annual Conference on Robot Learning
Event Place:	Mountain View, CA, USA

State:	Published

Links:	PDF

BibTex @conference{doerr2017optimizing, title = {Optimizing Long-term Predictions for Model-based Policy Search}, author = {Doerr, Andreas and Daniel, Christian and Nguyen-Tuong, Duy and Marco, Alonso and Schaal, Stefan and Toussaint, Marc and Trimpe, Sebastian}, booktitle = {Proceedings of 1st Annual Conference on Robot Learning (CoRL)}, volume = {78}, pages = {227-238}, editors = {Sergey Levine and Vincent Vanhoucke and Ken Goldberg}, month = nov, year = {2017}, doi = {}, month_numeric = {11} }