Projectdetail

Grant DOI 10.55776/PAT6918624
Funding program Principal Investigator Projects
Status ongoing
Start September 8, 2025
End September 7, 2027
Funding amount € 187,741

Disciplines

Computer Sciences (80%); Mathematics (20%)

Keywords

Reinforcement Learning,
Satisficing,
Regret,
Multi-Armed Bandit,
Computational Learning Theory,
Markov decision process

Abstract

The research area of reinforcement learning develops algorithms that are able to learn complex behavior (like driving or playing a computer or board game). Some of the considered learning problems aim to learn some optimal behavior, where the goal is to be able to do something as good as possible. For example, when learning to play a computer game the goal might be to score the maximum number of points. Most reinforcement learning algorithms are indeed based on optimization, that is, they aim to maximize rewards (such as the scoring points in a computer game). However, there are many learning problems that actually do not contain an optimization component. Thus, an autonomous car that shall get us to work need neither be as fast as possible nor take the shortest route. It would usually be sufficient if it manages to be right on time. For most of the currently available learning algorithms it would still be necessary to formulate the problem setting as an optimization problem to be able to apply them. This not only means additional work. The arising optimization problems are usually also hard to solve. For example, computing the shortest or fastest route to work (up to inches or seconds) is practically infeasible. Accordingly, most learning algorithms are hardly applicable to typical real world problems. A precursor project investigated the question whether there is an advantage in solving problems not optimally bot only sufficiently. While it was known that an optimal strategy can only be solved in approximation, it could be shown that a sufficient strategy with respect to given satisficing level can also be learned exactly. Remarkably, this also means that an optimal strategy can be learned exactly if the learner knows a sufficiency level that is only satisfied by the optimal strategy. Accordingly, in the current project we aim to look at reinforcement learning algorithms that try to adaptively determine such an appropriate satisficing level. These algorithm may be able to learn in real world problems more efficiently and hence much faster.

Research institution(s)

Montanuniversität Leoben - 100%

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

From Satisficing to Optimization in Reinforcement Learning

Disciplines

Keywords

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

From Satisficing to Optimization in Reinforcement Learning

Disciplines

Keywords