Projectdetail

Disciplines

Computer Sciences (100%)

Keywords

Reinforcement Learning (Theory)

Abstract

Final report

The research area of reinforcement learning develops algorithms that are able to learn complex behavior (like driving or playing a computer or board game). Some of the considered learning problems aim to learn some optimal behavior, where the goal is to be able to do something as good as possible. For example, when learning to play a computer game the goal might be to score the maximum number of points. Most reinforcement learning algorithms are indeed based on optimization, that is, they aim to maximize rewards (such as the scoring points in a computer game). However, there are many learning problems that actually do not contain an optimization component. Thus, an autonomous car that shall get us to work needs neither be as fast as possible nor take the shortest route. It would usually be sufficient if it manages to be right on time. For most of the currently available learning algorithms it would still be necessary to formulate the problem setting as an optimization problem to be able to apply them. This not only means additional work. The arising optimization problems are usually also hard to solve. For example, computing the shortest or fastest route to work (up to inches or seconds) is practically infeasible. Accordingly, most learning algorithms are hardly applicable to typical real world problems. The project at hand aims to find algorithms that are not able to solve problems optimally but just good enough, but do that much faster. In a first step it will be necessary to work on suitable mathematica l models, for which in a second step we shall develop learning algorithms that are more widely applicable to real world problems.

The research area of reinforcement learning develops algorithms that are able to learn complex behavior (like driving or playing a computer or board game). Some of the considered learning problems aim to learn some optimal behavior, where the goal is to be able to do something as good as possible. For example, when learning to play a computer game the goal might be to score the maximum number of points. Most reinforcement learning algorithms are indeed based on optimization, that is, they aim to maximize rewards (such as the scoring points in a computer game). However, there are many learning problems that actually do not contain an optimization component. Thus, an autonomous car that shall get us to work need neither be as fast as possible nor take the shortest route. It would usually be sufficient if it manages to be right on time. For most of the currently available learning algorithms it would still be necessary to formulate the problem setting as an optimization problem to be able to apply them. This not only means additional work. The arising optimization problems are usually also hard to solve. For example, computing the shortest or fastest route to work (up to inches or seconds) is practically infeasible. Accordingly, most learning algorithms are hardly applicable to typical real world problems. The project at hand aimed at the development of algorithms that do not try to solve problems optimally but just good enough. The found algorithms were analyzed with mathematical methods and indeed show an improved performance in the simplified problem setting. Algorithms that want to learn optimal behavior can never be sure that they have already found the optimal strategy. Accordingly, they have to try out seemingly worse options every now and then. However, this is not the case when one wants to reach only a certain performance level. This also means that when the learner is given a suitable performance level which is only reached by the optimal strategy, learning the latter can be done much more efficiently.

Research institution(s)

Montanuniversität Leoben - 100%

Research Output

6 Publications
2 Datasets & models
4 Scientific Awards

Publications

Title	Online Regret Bounds for Satisficing in Markov Decision Processes
DOI	10.1287/moor.2023.0275
Type	Journal Article
Author	Hajiabolhassan H
Journal	Mathematics of Operations Research

Title	Understanding the Gaps in Satisficing Bandits
Type	Conference Proceeding Abstract
Author	Ortner R
Conference	European Workshop on Reinforcement Learning

Title	Online Regret Bounds for Satisficing in MDPs
Type	Conference Proceeding Abstract
Author	Hajiabolhassan H
Conference	European Workshop on Reinforcement Learning
Link	Publication

Title	Regret Bounds for Satisficing in Multi-Armed Bandit Problems
Type	Journal Article
Author	Hajiabolhassan H
Journal	Transactions on Machine Learning Research
Link	Publication

Title	Adaptive Algorithms for Meta-Induction
DOI	10.1007/s10838-021-09590-2
Type	Journal Article
Author	Ortner R
Journal	Journal for General Philosophy of Science
Pages	433-450
Link	Publication

Title	Regret Bounds for Satisficing in Multi-Armed Bandit Problems
Type	Conference Proceeding Abstract
Author	Hajiabolhassan H
Conference	European Workshop on Reinforcement Learning
Link	Publication

Datasets & models

Public Access
Title	Sat-UcRL for satisficing in MDPs
Type	Computer model/algorithm
Link	Link

Public Access
Title	Sat-UCB for satisficing in the multi-armed bandit setting
Type	Computer model/algorithm
Link	Link

Scientific Awards

Title	Invitation as Speaker to Reinforcement Learning for Stochastic Networks Workshop in Toulouse
Type	Personally asked as a key note speaker to a conference
Level of Recognition	Continental/International

Title	Poster EWRL 2024
Type	Poster/abstract prize
Level of Recognition	Continental/International

Title	Poster EWRL 2023
Type	Poster/abstract prize
Level of Recognition	Continental/International

Title	Poster EWRL 2022
Type	Poster/abstract prize
Level of Recognition	Continental/International

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Reinforcement Learning: Beyond Optimality

Reinforcement Learning: Beyond Optimality

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Reinforcement Learning: Beyond Optimality

Reinforcement Learning: Beyond Optimality

Disciplines

Keywords

Research Output