Reinforcement Learning: Beyond Optimality
Reinforcement Learning: Beyond Optimality
Disciplines
Computer Sciences (100%)
Keywords
-
Reinforcement Learning (Theory)
The research area of reinforcement learning develops algorithms that are able to learn complex behavior (like driving or playing a computer or board game). Some of the considered learning problems aim to learn some optimal behavior, where the goal is to be able to do something as good as possible. For example, when learning to play a computer game the goal might be to score the maximum number of points. Most reinforcement learning algorithms are indeed based on optimization, that is, they aim to maximize rewards (such as the scoring points in a computer game). However, there are many learning problems that actually do not contain an optimization component. Thus, an autonomous car that shall get us to work needs neither be as fast as possible nor take the shortest route. It would usually be sufficient if it manages to be right on time. For most of the currently available learning algorithms it would still be necessary to formulate the problem setting as an optimization problem to be able to apply them. This not only means additional work. The arising optimization problems are usually also hard to solve. For example, computing the shortest or fastest route to work (up to inches or seconds) is practically infeasible. Accordingly, most learning algorithms are hardly applicable to typical real world problems. The project at hand aims to find algorithms that are not able to solve problems optimally but just good enough, but do that much faster. In a first step it will be necessary to work on suitable mathematica l models, for which in a second step we shall develop learning algorithms that are more widely applicable to real world problems.
The research area of reinforcement learning develops algorithms that are able to learn complex behavior (like driving or playing a computer or board game). Some of the considered learning problems aim to learn some optimal behavior, where the goal is to be able to do something as good as possible. For example, when learning to play a computer game the goal might be to score the maximum number of points. Most reinforcement learning algorithms are indeed based on optimization, that is, they aim to maximize rewards (such as the scoring points in a computer game). However, there are many learning problems that actually do not contain an optimization component. Thus, an autonomous car that shall get us to work need neither be as fast as possible nor take the shortest route. It would usually be sufficient if it manages to be right on time. For most of the currently available learning algorithms it would still be necessary to formulate the problem setting as an optimization problem to be able to apply them. This not only means additional work. The arising optimization problems are usually also hard to solve. For example, computing the shortest or fastest route to work (up to inches or seconds) is practically infeasible. Accordingly, most learning algorithms are hardly applicable to typical real world problems. The project at hand aimed at the development of algorithms that do not try to solve problems optimally but just good enough. The found algorithms were analyzed with mathematical methods and indeed show an improved performance in the simplified problem setting. Algorithms that want to learn optimal behavior can never be sure that they have already found the optimal strategy. Accordingly, they have to try out seemingly worse options every now and then. However, this is not the case when one wants to reach only a certain performance level. This also means that when the learner is given a suitable performance level which is only reached by the optimal strategy, learning the latter can be done much more efficiently.
- Montanuniversität Leoben - 100%
Research Output
- 5 Publications
- 2 Datasets & models
- 4 Scientific Awards
-
2024
Title Understanding the Gaps in Satisficing Bandits Type Conference Proceeding Abstract Author Ortner R Conference European Workshop on Reinforcement Learning -
2022
Title Adaptive Algorithms for Meta-Induction DOI 10.1007/s10838-021-09590-2 Type Journal Article Author Ortner R Journal Journal for General Philosophy of Science Pages 433-450 Link Publication -
2022
Title Regret Bounds for Satisficing in Multi-Armed Bandit Problems Type Conference Proceeding Abstract Author Hajiabolhassan H Conference European Workshop on Reinforcement Learning Link Publication -
2023
Title Regret Bounds for Satisficing in Multi-Armed Bandit Problems Type Journal Article Author Hajiabolhassan H Journal Transactions on Machine Learning Research Link Publication -
2023
Title Online Regret Bounds for Satisficing in MDPs Type Conference Proceeding Abstract Author Hajiabolhassan H Conference European Workshop on Reinforcement Learning Link Publication
-
2024
Title Invitation as Speaker to Reinforcement Learning for Stochastic Networks Workshop in Toulouse Type Personally asked as a key note speaker to a conference Level of Recognition Continental/International -
2024
Title Poster EWRL 2024 Type Poster/abstract prize Level of Recognition Continental/International -
2023
Title Poster EWRL 2023 Type Poster/abstract prize Level of Recognition Continental/International -
2022
Title Poster EWRL 2022 Type Poster/abstract prize Level of Recognition Continental/International