DELTA (Dynamically Evolving Long-Term Autonomy)
DELTA (Dynamically Evolving Long-Term Autonomy)
Disciplines
Computer Sciences (80%); Mathematics (20%)
Keywords
-
Reinforcement learning,
Statistical learning theory,
Exploration
Many complex autonomous systems (e.g., electrical distribution systems) repeatedly select certain operations with the aim of achieving a given objective. The paradigm of reinforcement learning (RL) offers a powerful framework for such settings: The task of the learner is to learn optimal behavior, which e.g. can be a sequence of coordinated actions to reach a certain goal state, simply by observing the feedback given by the environemnt to the actions chosen by the learner. Although RL has produced impressive results recently (e.g. achieving human-level performance in Atari games and beating the human world champion in the board game Go), most existing RL algorithms only work under strong assumptions. Thus, usually the environment is assumed to be stationary, the objective fixed, and trials end once the objective is met. The aim of this project is to advance the state-of-the-art in RL by developing novel algorithms that allow to relax the above assumptions and work in changing environments with changing objectives. This makes lifelong learning possible where the learner has to fulfill several different tasks over long periods of time. The proposed algorithms will address three crucial problems related to lifelong RL: exploration, planning, and task decomposition. Exploration deals with the problem of how to efficiently obtain a model of the environment without trying to satisfy any particular objective. Planning concerns the computation of an optimal strategy when such a model is given or has been found using exploration. Finally, task decomposition tries to decompose a complex objective into smaller subtasks defining for each subtask an individual objective such that the optimal strategies for the subtasks result in an optimal strategy for whole problem. The developed algorithms will be evaluated in two realistic scenarios of high practical importance: active network management for electrical distribution systems, and microgrid management.
The aim of our project is to design control strategies with a lifelong learning capability. Such controls allow systems to adapt to changes in their environment and maintain a nearly optimal performance. This project deals with controls implemented in autonomous systems, for example electrical distribution networks. Such a control repeatedly and ongoingly selects actions with the aim of achieving a given objective. Such an objective could be the avoidance of a blackout while providing energy at low cost. For a static system - a system without significant changes - a nearly optimal control can be obtained. For complicated such controls, reinforcement learning is a method of choice for this optimization. But systems that are employed for a long time, are likely to be confronted with changes in their environment. Thus the aim of our project is to design control strategies with a lifelong learning capability. By such a control, a system is able to adapt to changes in its environment and maintain a nearly optimal performance. An example for such a system is the control of an electrical micro-grid that needs to balance renewable and conventional power sources while facing volatile power production and consumer behavior. Such a micro-grid served as test bed for our research. The main focus of our work in this collaborative research project is exploration: finding out which actions are beneficial in the long run, and which actions should be avoided. There is a trade-off, though, between exploration and the use of information obtained so far: during exploration the system performance might be sub-optimal, since new - and possibly bad - actions are selected. Furthermore, exploration in a changing environment is particularly challenging, since collected information might become invalid after a change. We extend methods from reinforcement learning to address these challenges, and we develop novel exploration strategies that automatically update information after a change. More importantly, our methods can detect change automatically and direct exploration accordingly. Reinforcement learning uses a reward model to train a strategy: the control is supposed to maximize the long term reward. We show that this can be also used for incremental exploration, for example by a robot. Incremental exploration means that first the immediate vicinity is explored, and then increasingly larger parts of the environment. In large environments, a compact and meaningful representation of the environment is extremely important for efficient learning: consider, for example, memorizing meaningful words versus random sequences of letters. Unfortunately, good representations for reinforcement learning are often not known. Thus, we have devised an algorithm that automatically adapts and chooses the best representation for its environment, which is a significant achievement.
- Montanuniversität Leoben - 100%
- Bertrand Cornélusse, Université de Liege - Belgium
- Michal Valko, Inria Lille - Nord Europe - France
- Anders Jonsson, Universitat Pompeu Fabra - Spain