Projectdetail

Grant DOI 10.55776/J3259
Funding program Erwin Schrödinger
Status ended
Start January 1, 2012
End October 31, 2012
Funding amount € 28,825

Disciplines

Computer Sciences (50%); Mathematics (50%)

Keywords

Reinforcement Learning,
Regret,
Markov decision processes,
Computational Learning Theory

Abstract

Markov decision processes (MDPs) are a generic tool for modeling stochastic environments and have found various applications since their introduction in the 1950s by Richard Bellman. In the 1980s Artificial Intelligence research discovered MDPs as models for learning optimal behavior in environments with "delayed feedback". While various algorithms for reinforcement learning in unknown MDPs have been developed, these methods have been denied a breakthrough in spite of some success stories like the backgammon algorithm of Gerald Tesauro. The major practical problem that prevents implementation of reinforcement learning algorithms for many potential applications is that typical algorithms are not efficient in environments with large state spaces. While many real world problems could in principle be handled by representing them as MDPs, such representations usually have a large state space or a large action space (and often both). Thus typical reinforcement learning algorithms are too costly, as their complexity and regret (the lost total reward with respect to an optimal strategy) grows linearly or even polynomially with the number of states and actions. The reason for this is that unlike humans, who can exploit symmetries and similarities in a learning problem, most reinforcement learning algorithms are not able to make use of the environment`s structure. The main focus of the proposed project lies on the investigation of similarity structures for MDPs, and the development of algorithms which are able to exploit such structures. The availability of such tools which are able to deal with structured environments will make reinforcement learning much more interesting for problem domains which are currently handled by heuristics, task specific expert knowledge, or not at all. Thus, applications would neither be restricted to toy problems nor to typical reinforcement learning domains like game playing. Instead, more general control problems in various areas such as robotics and logistics would become accessible to reinforcement learning methods. The proposed project will concentrate on the following two topics: First, similarity structures for state aggregation in MDPs shall be examined, and in a further step exploited by adaptive online aggregation algorithms. Second, these aggregation techniques shall be applied to MDPs with continuous state space, a setting which is of particular importance for applications. In design and analysis of algorithms, application of suitable upper confidence bounds will play a key role. The project shall be conducted within the SequeL team of INRIA Lille, an interdisciplinary center for reinforcement learning. However, collaboration will not be confined to the SequeL group, as INRIA hosts other groups on neighboring fields such as optimization, statistics, and control theory, which may contribute to the success of the project.

Research institution: abroad phase

Inria Lille - Nord Europe , 10 months, Munos Remi

Research Output

42 Citations
2 Publications

Publications

Title	Regret Bounds for Restless Markov Bandits
DOI	10.1007/978-3-642-34106-9_19
Type	Book Chapter
Author	Ortner R
Publisher	Springer Nature
Pages	214-228

Title	Adaptive aggregation for reinforcement learning in average reward Markov decision processes
DOI	10.1007/s10479-012-1064-y
Type	Journal Article
Author	Ortner R
Journal	Annals of Operations Research
Pages	321-336

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Structure in Reinforcement Learning

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Structure in Reinforcement Learning

Disciplines

Keywords

Research Output