Projectdetail

Grant DOI 10.55776/I3437
Funding program International - Multilateral Initiatives
Status ended
Start February 1, 2018
End May 31, 2021
Funding amount € 213,790
Project website

Further EU Initiatives: CHIST-ERA

Disciplines

Computer Sciences (80%); Mathematics (20%)

Keywords

Reinforcement learning,
Statistical learning theory,
Exploration

Abstract

Final report

Many complex autonomous systems (e.g., electrical distribution systems) repeatedly select certain operations with the aim of achieving a given objective. The paradigm of reinforcement learning (RL) offers a powerful framework for such settings: The task of the learner is to learn optimal behavior, which e.g. can be a sequence of coordinated actions to reach a certain goal state, simply by observing the feedback given by the environemnt to the actions chosen by the learner. Although RL has produced impressive results recently (e.g. achieving human-level performance in Atari games and beating the human world champion in the board game Go), most existing RL algorithms only work under strong assumptions. Thus, usually the environment is assumed to be stationary, the objective fixed, and trials end once the objective is met. The aim of this project is to advance the state-of-the-art in RL by developing novel algorithms that allow to relax the above assumptions and work in changing environments with changing objectives. This makes lifelong learning possible where the learner has to fulfill several different tasks over long periods of time. The proposed algorithms will address three crucial problems related to lifelong RL: exploration, planning, and task decomposition. Exploration deals with the problem of how to efficiently obtain a model of the environment without trying to satisfy any particular objective. Planning concerns the computation of an optimal strategy when such a model is given or has been found using exploration. Finally, task decomposition tries to decompose a complex objective into smaller subtasks defining for each subtask an individual objective such that the optimal strategies for the subtasks result in an optimal strategy for whole problem. The developed algorithms will be evaluated in two realistic scenarios of high practical importance: active network management for electrical distribution systems, and microgrid management.

The aim of our project is to design control strategies with a lifelong learning capability. Such controls allow systems to adapt to changes in their environment and maintain a nearly optimal performance. This project deals with controls implemented in autonomous systems, for example electrical distribution networks. Such a control repeatedly and ongoingly selects actions with the aim of achieving a given objective. Such an objective could be the avoidance of a blackout while providing energy at low cost. For a static system - a system without significant changes - a nearly optimal control can be obtained. For complicated such controls, reinforcement learning is a method of choice for this optimization. But systems that are employed for a long time, are likely to be confronted with changes in their environment. Thus the aim of our project is to design control strategies with a lifelong learning capability. By such a control, a system is able to adapt to changes in its environment and maintain a nearly optimal performance. An example for such a system is the control of an electrical micro-grid that needs to balance renewable and conventional power sources while facing volatile power production and consumer behavior. Such a micro-grid served as test bed for our research. The main focus of our work in this collaborative research project is exploration: finding out which actions are beneficial in the long run, and which actions should be avoided. There is a trade-off, though, between exploration and the use of information obtained so far: during exploration the system performance might be sub-optimal, since new - and possibly bad - actions are selected. Furthermore, exploration in a changing environment is particularly challenging, since collected information might become invalid after a change. We extend methods from reinforcement learning to address these challenges, and we develop novel exploration strategies that automatically update information after a change. More importantly, our methods can detect change automatically and direct exploration accordingly. Reinforcement learning uses a reward model to train a strategy: the control is supposed to maximize the long term reward. We show that this can be also used for incremental exploration, for example by a robot. Incremental exploration means that first the immediate vicinity is explored, and then increasingly larger parts of the environment. In large environments, a compact and meaningful representation of the environment is extremely important for efficient learning: consider, for example, memorizing meaningful words versus random sequences of letters. Unfortunately, good representations for reinforcement learning are often not known. Thus, we have devised an algorithm that automatically adapts and chooses the best representation for its environment, which is a significant achievement.

Research institution(s)

Montanuniversität Leoben - 100%

International project participants

Bertrand Cornélusse, Université de Liege - Belgium
Michal Valko, Inria Lille - Nord Europe - France
Anders Jonsson, Universitat Pompeu Fabra - Spain

Research Output

11 Publications

Publications

Title	Regret Bounds for Reinforcement Learning via Markov Chain Concentration
Type	Journal Article
Author	Ortner Ronald
Journal	JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
Pages	115-128

Title	Gambler Bandits and the Regret of Being Ruined. 20th Int. Conf. on Autonomous Agents and Multiagent Systems
Type	Conference Proceeding Abstract
Author	Perotto Fs
Conference	20th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2021)
Link	Publication

Title	Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
DOI	10.48550/arxiv.1802.04020
Type	Preprint
Author	Fruit R
Link	Publication

Title	Regret Bounds for Reinforcement Learning via Markov Chain Concentration
DOI	10.1613/jair.1.11316
Type	Journal Article
Author	Ortner R
Journal	Journal of Artificial Intelligence Research

Title	Regret Bounds for Learning State Representations in Reinforcement Learning
Type	Conference Proceeding Abstract
Author	Ortner R
Conference	33rd Conf. on Neural Processing Systems (NeurIPS 2019)
Link	Publication

Title	Autonomous exploration for navigating in non-stationary CMPs
Type	Other
Author	Gajane P
Link	Publication

Title	Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information
Type	Conference Proceeding Abstract
Author	Auer P
Conference	32nd Ann. Conf. on Learning Theory (COLT 2019)
Link	Publication

Title	Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes
Type	Conference Proceeding Abstract
Author	Auer P
Conference	32nd Ann. Conf. on Learning Theory (COLT 2019)
Link	Publication

Title	Variational Regret Bounds for Reinforcement Learning
Type	Conference Proceeding Abstract
Author	Gajane P
Conference	Conf. on Uncertainty in Artificial Intelligence (UAI 2019)
Link	Publication

Title	Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms
DOI	10.24963/ijcai.2023/413
Type	Conference Proceeding Abstract
Author	Auer P
Pages	3714-3722
Link	Publication

Title	Adaptive Algorithms for Meta-Induction
DOI	10.1007/s10838-021-09590-2
Type	Journal Article
Author	Ortner R
Journal	Journal for General Philosophy of Science
Pages	433-450
Link	Publication

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

DELTA (Dynamically Evolving Long-Term Autonomy)

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

DELTA (Dynamically Evolving Long-Term Autonomy)

Disciplines

Keywords

Research Output